Tom O’Connor Explains Predictive Coding

Well, not exactly.  Like Henry Ford, when I don’t know the answer to a question I try to find a colleague who DOES know the answer. So for an explanation of predictive coding I turned to Gavin Manes, President and CEO of Avansic Inc.  Gavin has a PhD in technical things I can’t even describe, was a computer sciences professor at the University of Tulsa and founded Avansic in 2004 to provide e-discovery services to companies in both the business and legal communities.

I am constantly asking the questions below of techies I know and since I’ve worked with Gavin many times, about a year ago I thought I would ask them of him.  The answers are just as relevant today as the day we spoke.

Q. So exactly what is predictive coding?

A.       Given a large set of documents, attorneys review a small sample set to their specifications, and their pattern of review is applied to the larger document set using a “content clustering” or “find similar” calculation. It represents a way to review a large volume of documents without the thousands of hours required for an attorney to put eyes onto each page.  This is typically used as a supplement to traditional e-discovery filtering methods such as keyword searching.

Q.       That doesn’t sound like the “Easy Button” description I’ve heard from many vendors.

A.       The Wall Street Journal once published an article on this very subject, entitled “Why Hire a Lawyer? Computers are Cheaper.” And the article highlights one of the main problems that arises when discussing predictive coding – that it is presented as a product. In truth, predictive coding isn’t just a piece of software that can be layered over other e-discovery processes. It is actually a process that can include a number of different steps, several pieces of software, and many decisions by the litigation team.

Being a process is both a benefit (far more flexible than a particular piece of software) and a drawback (because it’s more complicated than simply applying a program.) This process isn’t fully automated either – as always, lawyers are a critical part of the equation.

Q.       So I don’t need to buy “predictive coding software”?

A.       At its core, predictive coding is about workflow. It’s a combination of attorneys and technology. In fact, keyword searches, sampling, and early culling could be considered a type of predictive coding since they help pare down the document set. This is particularly true in the early stages of e-discovery.

Using some off the shelf predictive coding engines, predictive coding can be accomplished using Concordance, Summation, LAW, or heritage products because it is a methodology not just a piece of software. There are a number of different predictive coding engines and algorithms available, and many have been around before being called predictive coding. In fact, near-dupe technology, otherwise known as “find similar,” is heavily used in predictive coding to calculate the logical clusters and groups in the remainder of the set.

Q.       So how does this predictive coding process work?

A.       The variables in this process include the selection of documents for the sample set, user response to the sample set, and the computer algorithms applied. These algorithms may learn “on the fly” or may be pre-calculated depending on the software and process used. For instance, poor user input will result in poor sample set results, which will then be propagated to the remainder of the document set.

Q.       Is there more than one type of predictive coding technology?

A.       The user experience in predictive coding is how you’re being led through the documents; is it a “choose your own adventure,” spokes on a wheel, purely random, or secret black box. “Choose your own adventure” is a style where the computer algorithm adjusts to your responses on the fly, changing the next document that might appear based on your previous input. Spokes on a wheel is a pre-defined set that covers all document clusters and users can dive deep into whatever interests them. Purely random selection, while statistically relevant, does not generate the most useful set. Secret black box is where only the developer of the computer algorithm knows what functions are being performed.

Q.       This all sounds pretty technical. What’s the real benefit?

A.       Predictive coding is an excellent way to reduce the number of documents that attorneys need to review. This is becoming a necessity as document sets increase dramatically concurrent with client unwillingness to pay for review.

One of the attractions of predictive coding is the ability to streamline first pass review. The processes listed above, in combination with the efficient use of technology, gives similar results as those described in predictive coding. This process will always include attorneys and technology, since attorneys will always have to look at some of the documents.

Q.       Will predictive coding really save me money?

A.       In a project where predictive coding is used, processing will cost more than normal but there will be substantial savings in the review phase. This assumes the process was the technology was used appropriately and that the project was well-planned and managed from the outset.

Of course, predictive coding can become expensive if the sample sets are poorly developed, if the data set includes documents that do not have natural language (graphics), or the vendor’s pricing model includes paying per month per gigabyte for hosting. In a predictive coding set, there will always be more data hosted in a review platform vs. keyword-based review. Management of the process is also critical – particularly since this is a process. It is also important to know that there may be several run-throughs of the sample set in order to achieve the desired results, so set appropriate expectations.

There is a substantial cost difference based on whether the data is processed in advance or on-the-fly. Processing and clustering in advance results in lower costs and a fixed error rate but requires multiple passes at the sample set. Allowing the adjustment of the error rate or heuristic in real time requires a large amount of processing power and a more complex algorithm; this results in higher costs and requires “middleware” between review technology and the predictive coding engine.

So what do we conclude from all this? The thought behind predictive coding – that technology can help reduce the cost of e-discovery – is a great one. Figuring out what pieces of technology to apply at what point in the workflow may not be as easy to determine. Consultants or other attorneys who are experienced in e-discovery can analyze your workflow or case and , particularly in large projects, can save enormous amounts of time and money. There’s no magic wand it’s just about using the right technology at the right time.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: