The Redheaded Stepchild of E-Discovery — Keyword Search— Receives Another Beating
Maureen O'Neill advises clients in the development and execution of effective, cost-efficient discovery strategies, including the use of statistical sampling, predictive coding, and other analytics and automation tools.
A few weeks ago, Judge Shira Scheindlin issued another opinion in National Day Laborer Organizing Network et al. v. United States Immigration and Customs Enforcement Agency, et al., a Freedom of Information Act (FOIA) case. In her order, Judge Sheindlin voiced strong approval for predictive coding and other advanced analytical techniques. But she also took the opportunity to dole out another spanking to keyword searches. However, practitioners should not misinterpret her criticism of the particular keyword-search approach used by the government here — which was well deserved — as a general condemnation of keyword searching. A rigorously developed, high-quality keyword search can still be an important component of a sound search strategy.
In National Day Laborer, Judge Scheindlin faulted the federal agencies that received plaintiffs’ FOIA requests for failing to adequately document, test and disclose the searches they used to locate potentially responsive information. She first leveled some generalized criticism at keyword searches:
Simple keyword searching is often not enough: ‘Even in the simplest case requiring a search of on-line e-mail, there is no guarantee that using keywords will always prove sufficient.’ There is increasingly strong evidence that ‘[k]eyword search[ing] is not nearly as effective at identifying relevant information as many lawyers would like to believe.’
But rather than reject the agencies’ search efforts out of hand, she ordered the parties to come together and try to reach compromise on “search terms and protocols — and, if necessary, testing to evaluate and refine those terms.” As Judge Scheindlin observed, “[t]here is a ‘need for careful thought, quality control, testing, and cooperation with opposing counsel in designing search terms or ‘keywords’ to be used to produce emails or other electronically stored information.’”
I’m pleased that Judge Sheindlin gave the agencies a chance to get it right, because when search terms are used properly — which means they are thoughtfully constructed, thoroughly tested, and well-documented — they can be effective at finding documents likely to contain responsive information.
But as I’ve noted in this blog before, there are tools at our disposal — predictive coding in particular — that, when properly deployed, often outperform keyword searches in finding relevant information. Judge Scheindlin recognized this in her order and encouraged parties to use these tools:
And beyond the use of keyword search, parties can (and frequently should) rely on latent semantic indexing, statistical probability models, and machine learning tools to find responsive documents.
Just as with keyword searching, predictive coding should be used only when coupled with a defensible process and only after counsel has assessed whether the technique is appropriate for the matter. Under the right circumstances, both keyword searching and predictive coding are valid methods of finding potentially relevant documents. And both deserve love and support in the e-discovery family.