In United States v. O’Keefe, United States v. O’Keefe, 537 F. Supp. 2d 14 (D.D.C. 2008), former U.S. Magistrate Judge John Facciola tackled the subject of using keyword search terms to help identify relevant documents for production in discovery. Observing that the proper use of search terms involves “the sciences of computer technology, statistics and linguistics,” the Judge offered the now famous quip that, for lawyers and judges to opine on the effectiveness of a given set of search terms “is truly to go where angels fear to tread.”
Yet litigators go there all the time. (They are a fearless bunch, to be sure.) Here at DiscoverReady, when we consult with clients on projects that involve the use of search terms to cull document collections, we encourage them to follow a few best practices. In a nutshell, the use of search terms should involve: (1) a collaborative, iterative, negotiated approach with the other side, and (2) statistical sampling and measurement to test and validate the results.
We recently gained some additional judicial support for this approach in an order from Magistrate Judge Donna Ryu of the Northern District of California. In In Re: Lithium Ion Batteries Antitrust Litigation, N.D. Cal. (Feb. 24, 2015), the court resolved a dispute between the parties regarding the final details of a mostly-agreed-upon protocol for using search terms. The plaintiffs insisted that, if a quantitative analysis of a challenged search term couldn’t resolve the dispute, defendants must turn over a qualitative sampling of some randomly selected “false positive” (not relevant) documents being returned by the search. Defendants objected to this aspect of the protocol, on the grounds that the federal rules do not entitle plaintiffs to obtain non-responsive, irrelevant documents in discovery.
The judge agreed with plaintiffs, noting that “the best way to refine searches and eliminate unhelpful search terms is to analyze a random sample of documents, including irrelevant ones, to modify the search in an effort to improve precision.” The court went on to explain:
[A] random sample that shows that a search is returning a high proportion of irrelevant documents is a bad search and needs to be modified to improve its precision in identifying relevant documents. The proposed sampling procedure is designed to prevent irrelevant documents from being reviewed or produced in the litigation, and will obviate, or at least clarify, motion practice over the search terms themselves.
But recognizing Defendants’ concern that the sampling protocol would result in the production of irrelevant information to which Plaintiffs have no right, the court ordered protections to guard against the production of any privileged or otherwise sensitive documents in the sample. Indeed, Defendants were given the right to “remove any irrelevant document(s) from the sample for any reason, provided that they replace the document(s) with an equal number of randomly generated document(s)” (emphasis added).
In my view, Lithium Ion underscores the idea that lawyers need to loosen their death grip on the notion that irrelevant documents should never voluntarily be produced in discovery. Sure, the other side is not entitled to see non-responsive documents. But if producing a small sampling of them while engaged in search term optimization will reduce motion practice, streamline the discovery process, and save both sides time and money, why not permit the opposing party to see some? Of course, such a voluntary production needs to have safeguards similar to those from Lithium Ion. And in some matters, there may be valid reasons to resist such a process. But in many instances, a facilitative approach to discovery is the better way.