Order Highlights Potential Costs of Predictive Coding

Predictive coding continues to gain momentum as a hedge against spiraling discovery costs. But if you assume that its usage will dramatically lower costs every time, you might be surprised by a recent patent infringement case in California. Although Staples may have an “easy” button, there is no such thing as an automatic “cheap” button when it comes to predictive coding.

The February 1 order issued by Judge Anthony Battaglia in Gabriel Technologies Corp. et  al. v. Qualcomm Inc. et al., reveals the potentially high costs associated with predictive coding, and demonstrates that predictive coding might not always be a huge cost saver when compared to manual review. The order, which awards sanctions to Qualcomm based on a finding that Gabriel’s patent infringement claims were frivolous, is not a discussion of predictive coding process and methodology in the vein of Da Silva Moore, Kleen, or the widely disseminated Actos protocol. However, in granting Qualcomm’s request for attorneys’ fees, the order provides some rare visibility into the costs that can be associated with predictive coding technologies. Judge Battaglia’s award granted Qualcomm:

  • $2.8 million for fees associated with computer assisted, algorithm driven document review, and
  • $392,000 for contract attorneys who reviewed the documents that the predictive coding technology determined were likely to be responsive based on the training it received.

This award of almost $3.2 million (which presumably does not include the fees incurred by Qualcomm for its outside counsel to develop and implement the solution and to supervise and review the results of the contract attorneys’ efforts) has been cited widely as an example of the “costs” of predictive coding. However, a careful analysis of the way predictive coding was used by Qualcomm illustrates not just the mere fact that predictive coding can be expensive — which it can be — but how the method of implementing the predictive coding solution significantly drives the cost.

The Implementation of Predictive Coding in Qualcomm:

As Judge Battaglia’s order explains:

Over the course of this litigation, Defendants collected almost 12,000,000 records — most in the form of Electronically Stored Information (ESI) … Rather than manually reviewing the huge volume of resultant records, Defendants paid [their e-discovery vendor] to employ its proprietary technology to sort these records into responsive and non-responsive documents … the [vendor’s] algorithm made initial responsiveness determinations for more than one million documents.

Following this process, contract attorneys manually reviewed the subset of likely responsive documents for “confidentiality, privilege, and relevance.” The court recognized that “the review performed by the [vendor] and [contract attorneys] accomplished different objectives, with the [vendor’s] electronic process minimizing the overall work for [the contract attorneys].

Although some of the more granular details are not set out in the order, some pertinent facts about the use of predictive coding are as follows:

  • Qualcomm incurred a “predictive coding charge” for the 12 million collected files (likely on a per-gigabyte or per-file basis).
  • Qualcomm used the predictive coding technology (in lieu of search terms or other more traditional approaches) to cull down the data set, and then subjected the subset of documents determined to have potential relevance to a human review for further analysis (a technique DiscoverReady endorses, and refers to as “Predictive CullingTM”).
  • Of the 12 million files that Qualcomm subjected to Predictive Culling, the software tool determined that slightly more than one million files – less than ten percent of the population – were likely to be responsive.
  • Contract attorneys then reviewed the likely responsive documents.

Object Lessons from Qualcomm:

The order in Qualcomm does not disclose particulars about the exact issues in dispute, the basis for Qualcomm’s large document collection, or the “traditional” search and review methodologies that may have been considered. In the absence of those details and the important context they would provide, it is difficult to evaluate the “merits” of predictive coding as applied to the matter or conduct a cost/benefit analysis of the technology and process. However, there are some important take-aways from the order:

  • The volume of data subjected to predictive coding will materially impact your technology costs. Significant cost savings can be achieved by identifying and culling out patently non-responsive documents (e.g., emails from non-business, non-relevant, and “spam” senders, non-relevant file types) and documents that are not appropriate for the predictive coding process (e.g., low text files, files without OCR or metadata) prior to application of the predictive coding technology.
  • Explore how to take full advantage of  automated review technology (even beyond predictive coding). In a very high-volume matter like Qualcomm, consider using analytical software tools not only to cull the data for responsiveness, but also to make presumptive privilege determinations (using a tool like PrivBank™) and issue coding decisions (using i-Decision™ for example). Such techniques can further reduce the number of documents sent to manual review.
  • A litigant cannot estimate the total cost of using predictive coding (or compare the cost of predictive coding to a more traditional review alternative) until it gathers information about the prevalence of relevant information (or “richness”) in the data set. In a collection of 12 million documents, for example, if the richness of responsive documents is one percent, the total costs of a predictive culling process will be materially different than if the richness was twenty percent. Before embarking down a document review path – whether using traditional or predictive coding methods – take a statistically appropriate sample and estimate the level of richness. Because this sampling process plays such an important role in both estimating and validating discovery efforts, DiscoverReady recently developed and released its Samplyzer™ tool.
  • Finally, recognize that the predictive coding process will require significant human involvement, both in training the technology and evaluating its results. These costs should be factored into your assessment of potential solutions. Consider both the volume of manual “eyes-on” document review that will be required to train the system and then evaluate the output of the application (which can be estimated using the sampling processes discussed above) and the cost of the human resources that will be conducting the manual review. Is it your outside counsel doing the review, a combination of your outside counsel and opposing counsel (as anticipated by Actos), or contract reviewers?  All these factors will have a material impact on costs.

Predictive coding is a proven driver of efficiency in discovery. But effective application of the technology still requires human expertise and judgment. As Gabriel Technologies highlights, the cost advantage of predictive coding depends on a multitude of details — and sometimes cost savings may not exist at all.