Making Sure Your Predictive Coding Solution Doesn’t Cost More…

Depending on how it’s used, predictive coding may result in a discovery process that is less expensive than a process built around traditional search term based culling and manual document review. I emphasize “may” because the notion that predictive coding is an automatic cost saver is one of the biggest misconceptions currently permeating the marketplace.

Four Main Cost Drivers of Predictive Culling When Planning a Workflow

Reflecting the vast majority of the reported court decisions, litigants primarily are using predictive coding technology for the limited purpose of culling the data in lieu of search terms. Following the culling process, in both a predictive coding and search term based workflow, parties are then manually reviewing for responsiveness, privilege and confidentiality the documents that “hit” or are identified as likely relevant. However, there are four primary cost drivers associated with a predictive “culling” based process for which a party must account when planning its workflow:

  1. There are additional technology and consulting costs for using predictive coding. In addition to paying your “normal” processing fee to prepare data for the hosting platform, a workflow built around predictive coding typically requires an additional processing fee for the technology itself. You will incur these charges regardless of whether you are pairing a predictive coding technology (such as Equivio Relevance) with a separate hosted review platform or whether the technology is incorporated into a single offering (such as Relativity Assisted Review or Recommind). While these costs are relatively small on a per unit basis, they can result in six figure fees when applied to a significant data collection. On top of that, you can expect to pay significant costs to the technologists and other consultants necessary to devise and implement a predictive coding workflow both before and after the processing fees are incurred.
  2. Predictive coding can result in the review of a much larger “potentially relevant” document set. When predictive coding is used to cull the data in lieu of keywords, there is no guarantee that it will result in fewer documents to review. In fact, it is possible that predictive coding will generate a much larger “potentially relevant” document set — and, along with it, a larger legal bill. This occurs because the predictive coding technology often identifies documents as potentially relevant even though the documents do not contain any agreed-upon search terms .
  3. Your subject matter experts are going to be reviewing more documents with predictive coding. Many corporations and law firms now use review vendors to conduct the majority of their document review, with outside counsel serving as subject matter experts (SMEs) that provide guidance and ensure the quality of the review team’s efforts. In a predictive coding workflow, the SMEs will continue to perform the quality-control and oversight function, but often also are responsible for performing first-pass review necessary to establish a control set and train the predictive coding technology. While some predictive coding providers will suggest as a rule that this process will require your SMEs to review 5,000 documents or fewer, in practice we have seen that SMEs review many times that number depending on the prevalence of relevant document in the data set and the margin of error you hope to achieve. Add to this the fact that there is an increasing trend for the SMEs to review a larger proportion of likely relevant documents as part of the manual review process, and in some cases SME fees may exceed the total costs of a traditional review of search-term results by a review vendor.
  4. You’re going to spend significant time and money negotiating a predictive coding protocol. At this point in its life cycle, neither litigants nor the courts have developed established ESI protocols or templates defining how parties can or should implement predictive coding. Many parties (or at least plaintiffs) currently suggest using the Actos protocol as a starting point for negotiation, but it requires a level of joint training and review that in most cases is impracticable and impossible to implement. As such, parties are spending an inordinate amount of time and money negotiating and developing ESI protocols on a case by case basis. While this cost will dissipate as the use of predictive coding becomes more prevalent, it likely will be part of any predictive coding process for the foreseeable future.

How to Minimize Predictive Coding Costs

So given these predictive coding cost drivers, are we suggesting that clients should uniformly shun predictive coding and hold fast to search term based culling and manual document review? Absolutely not. Here at DiscoverReady, we are huge proponents of predictive coding. We have been recognized as leaders in providing automated discovery solutions, and believe it’s critical that we actively work with our clients to identify and develop discovery solutions that employ appropriate technology based on the requirements of each case. As such, we take steps on every project to accurately account, budget for, and where appropriate mitigate, the impact of these cost drivers. These steps include:

  • Creating statistical samples to compare the prevalence of relevant data in a client’s data set as identified by a search term and predictive coding based process. This allows us to accurately project the volume of documents the SMEs will have to review to train the predictive coding technology, as well as the volume of documents identified as likely relevant and potentially subject to review as a result of each process.
  • Preparing budgets based on the result of the statistical sampling that give our clients visibility into the process .
  • Consulting on methodologies to modify either the proposed training criteria for the predictive coding tool or search term syntax while validating the results with real time measurements of the precision and recall of each process.
  • Relying on our extensive experience designing and implementing predictive coding based workflows, and leveraging our knowledge (and templates) from other matters to reduce the fees associated with negotiating and implementing an ESI process built around predictive coding.

Armed with these steps, we can help our clients identify the matters where predictive coding based culling is appropriate….and also those cases where the costs ultimately may be more than they bargained for.

1Recognizing that this is a topic for a separate blog, the likelihood of a predictive coding process generating more documents to review hinges on two important points.

  • First, predictive coding likely will result in increased costs only if you would have been able to negotiate very narrow, targeted search terms that would have returned a very high percentage of relevant documents. These narrowly construed search terms would exclude from the review population a significant portion of likely irrelevant data — as well as data that predictive coding technology might deem as potentially relevant. This scenario obviously does not frequently arise when you are dealing with very broad search terms.
  • Second, and relatedly, if you appropriately utilize a predictive coding process, you can expect a smaller percentage of likely irrelevant documents in your review set. In a scenario where search-term-based culling would result in a smaller review population, the additional documents identified through the predictive coding process represent likely relevant documents that either do not contain a search term or otherwise would have been excluded from the review population.

You address each of these factors and ensure a smaller review population that is more likely to contain relevant data by combining search term and predictive coding culling methodologies. This is accomplished by applying broad search terms to the document collection, and then utilizing predictive coding to further cull the data set to exclude irrelevant data that hit on the keyword searches. However, it’s important to recognize that with the limited exception of the very recent In re Biomet M2a Magnum Hip Implant Products Liability Litigation (MDL 2391), the reported decisions and litigants are not contemplating a hybrid search term/predictive coding based approach. Moreover, given that the whole thrust of predictive coding is to utilize a culling methodology that presumably is superior to search terms, an opposing litigant who is championing the use of predictive coding likely will oppose (perhaps unsuccessfully in light of Biomet) the idea of applying search terms to the data prior to application of the predictive coding technology.