Posted by Andreas Olligschlaeger Ph. D. on Mon, Aug 17, 2009 @ 01:05 PM

Ok, so I left off at the point where we had all of this data and didn't know what to do with it. Let's recap the types of data most facilities have about inmates:
- telephone call recordings
- inmate visitation logs and recordings
- information contained in inmate management systems
- financial transactions in commissary accounts
Some facilities may have more data, such as emails to and from inmates and any information that is generated by detectives or internal affairs units.
The bottom line is that this can be quite a lot of information. Worse, information is typically in different formats and, more often than not, in many different locations. Telephone recordings are in one database, inmate management records are in another and investigative information might be in a database, or it could be on bits and pieces of paper scattered about detectives' desks.
To the uninitiated or those used to simple transactional databases this scene may sound like a horror story, but in reality it's a dilemma that crime and intelligence analysts in law enforcement and corrections face every day. With this in mind, it is really no surprise that the typical analyst spends 70% or more of his or her time gathering data and getting it into a format where it's useable, leaving at best 30% of their time spent doing actual analysis. In an ideal world, these figures should be reversed.
What analysts and investigators need to be more efficient is some way to centralize all of this data, make it compatible and make it available. You want the data to be available in such a way that the analyst doesn't have to worry about where to get it, how to put it together, what format it's in, or whether they'll have to spend countless hours correcting sloppy data entry or errors (a HUGE pet peeve of mine; I might write a separate article about this at a later point), matching up records and all those other frustrating tasks that make you want to pull your hair out and wish you'd taken a different career path.
The good news is, it's possible. The key is data warehousing. Data warehousing is a relatively inexpensive way to get information into one place. Best of all, it is certainly a lot cheaper and easier to do than trying to get multiple computer systems and databases to work together.
So, what is data warehousing? Well, one way of thinking about it is to consider each separate database (inmate records, phone system, visitations) as a producer of parts, just like automotive parts. In this scenario, the data warehouse is the assembly plant for these parts. The consumer of the final product is the analyst or investigator.
One rather obvious advantage to warehousing is that instead of winding up with a bunch of car parts stored in different warehouses in different places, you can actually create a finished product, one that you can drive or, in our case, data that's in a format ready to be analyzed. In other words, analysts or investigators don't have to worry about figuring out how to build a car and how to put the different parts together; instead, they just go to the warehouse and pick up the type of car they need for a particular job. After all, if everybody who drove a car also had to be an engineer and know how to build a car, there wouldn't be many people driving cars. The irony is that for many crime analysts, this is the case. Apart from analytical skills, many modern crime analysts also need to know about databases, mapping, address matching, data transformation, data cleaning and in many cases, some level of programming. In fact, crime and intelligence analysts are arguably some of the most highly trained and well rounded folks in the criminal justice system. A well designed and implemented data warehouse can for the most part eliminate the need for those extra skills and allow analysts to concentrate on their primary objective: analysis.
Another advantage of warehousing is that it adds value to the pieces and parts that feed it. Just as a finished car is more valuable than the sum of its parts, so is the data contained in a warehouse.
So, from a practical viewpoint, a data warehouse does a lot more than just collect data from different sources and put them into a single location. It adds value to the original data by performing a number of operations on it, just as auto workers (or, these days, increasingly robots) do with car parts. For example, a data warehouse automatically attempts to correct certain types of errors in the data whenever a record is uploaded from a database, such as misspelled names or inconsistent addresses. This makes it easier to connect records from different databases and is usually done using a set of rules or something called "fuzzy logic", where algorithms are applied to try and match data. As a data warehouse grows, these rules also grow. With each install of our Call IQ software, for example, the data from the various databases feeding the warehouse are also analyzed. Every time an inconsistency is found that can be fixed automatically, a way to fix it is added to the rule base. As a result, all customers benefit from each new install. Naturally, it is still very important to ensure consistent data entry instead of relying on a data warehouse to fix things for the very simple reason that not everything can be fixed.
Another way that a data warehouse adds value is by collecting meta data. "Meta data" is really just a fancy term for "data about data", or statistical information about the data contained in the warehouse. For example, when looking at a phone call an investigator might want to know whether any other inmates have called the same number, or which other numbers the inmate has called. Rather than having to run a separate query, all the investigator has to do is look at the meta data, which is complied automatically. Also, a data warehouse can automatically detect linkages between people, places and organizations.
A data warehouse also adds value by transforming data. As mentioned earlier, corrections data can be in many different formats. So for example, audio recordings can be transformed into text, where they can be further analyzed using natural language techniques such as topic detection. Another example is attaching geographic coordinates to addresses so that they can be mapped.
These are just some of the advantages of data warehousing. A lot of well publicized efforts (and some rather spectacular failures) in criminal justice have centered on making different computer systems and their databases compatible rather than deploying a data warehouse. That's all well and good if all you want to do is retrieve information from a single interface, but for analytical purposes you still wind up with data in varying formats and standards, and you don't get meta data that encompasses all data within the various systems. Another downside is that analytical queries can be very complex and require a lot of processing power. That in turn tends to slow down production systems. Earlier in my career, many were the times that I got "taken out to the woodshed" for causing the records management system to come to a screeching halt with my queries.
The bottom line is that for most analytical and investigative applications where the data spans a variety of different platforms and databases, a data warehouse is the cheapest, most efficient way to go.