## Blog

# Yes, Counselor, There Will Be Math: Why Litigators Need to Learn Some Statistics

November 27th, 2013There’s a tired old joke out there among lawyers, many of whom sputter and wave their arms in protest when asked to engage in anything involving math: “But I went to law school to *avoid* math!”

But for litigators engaged in discovery, math is no joke. In fact, to competently represent their clients, attorneys must acquire a basic working knowledge of a few key statistical concepts.

Over the next few months in this blog, my colleagues and I will dive deeper, explaining the particular statistics lawyers need to know, and exploring some specific use cases for the application of statistics in discovery. Today, we make the case for statistics with several well-reasoned answers to one very basic question.

*Q: Why do litigators need to understand certain statistical measurement concepts?*

*Q: Why do litigators need to understand certain statistical measurement concepts?*

*A: To improve the overall quality of discovery. *

By using statistics to measure the inputs and outputs of a discovery process, counsel can work to improve the process to make it more accurate. For example, when engaged in the evaluation and/or negotiation of search terms to cull a document collection, you may find that a proposed term is bringing in far too many “false positives” (*i.e.*, poor “precision”). Rather than simply reject the proposed term with a subjective characterization of the poor results, you can use statistics to quantify the problem. Even better, try an alternative search term and test the new results using sampling. If statistical sampling confirms that the new term reduces the number of irrelevant documents (better precision), but is not under-inclusive (*i.e.*, the “recall” is good), you have improved the quality of the search process (and have solid support for a counter-proposal to the other side if necessary).

As another example, counsel can take a simple measurement of the “richness” of a document population – *i.e.*, the prevalence of responsive documents – and use that information to design a more tailored, effective workflow for the document review. Once review is under way, sampling can be used to test the quality of the review decisions, and create protocols for remediation where quality falls below an acceptable threshold.

Traditionally, lawyers relied on subjective assessments of the quality of their document productions and other discovery efforts. In the world of paper documents, an objective, statistical validation of discovery simply was not feasible or reasonable. But now, virtually all documents are found in electronic form, and litigants use sophisticated e-discovery systems to process and review those documents. These tools make it easy and inexpensive to generate statistical measures around the quality of a discovery process. And these methodologies are not just limited to bet-the-company cases or matters involving advanced technologies like predictive coding. Statistical measurements can be used effectively in almost all cases, even those using traditional processes such as keyword searches and manual human document review.

*A: To more effectively defend clients’ discovery efforts.*

Let’s say you’re faced with a challenge to the completeness of a document production. Without statistics, you might be able to say something like this:

“We believe that our client has produced all relevant documents. We’re confident that we arrived at a good set of search terms that found the documents we were looking for. We used an experienced contract review team to examine the documents that hit on the searches, and based on our spot-checks of their work, we think they made good decisions.”

But with the application of some statistical measurements, you could make this more objective – and more compelling – statement instead:

“We know with a confidence level of 95% that our client has produced at least 90% of the relevant documents, based on a statistically valid measurement of the efficacy of the keyword searches and the human decisions made in reviewing the documents returned by the searches. We also tested the quality of the human review decisions by examining statistically significant samples of the team’s work, and we confirmed that their decisions were more than 95% accurate.”

A statistically valid, quantitative method to prove up the completeness of a document production, or defend some other aspect of compliance with document discovery, can be much more effective than a subjective, gut-feel argument to opposing counsel or the court.

*A: To save clients time and money in discovery.*

Statistical sampling allows us to examine a relatively small subset of a document collection and draw valid conclusions about the remainder of the collection. Consider the scenario where your opponent claims that a particular custodian will possess relevant documents, and insists that her documents should be collected and produced; your client, on the other hand, believes that this custodian will have few, if any, relevant documents. Why not pull a statistically valid sample of her documents and review only those documents? If the sample turns up little or no relevant content, you now have objective ammunition to resist the discovery request, and you have spent relatively little time or effort.

Similarly, rather than spend valuable law-firm-attorney time conducting a haphazard second-level QC review of thousands of documents conducted by contract attorneys, rely instead on statistical samples of the review team’s decisions. The QC review will include fewer documents (which means less time and cost), and the conclusions drawn about the accuracy of the work will be far stronger.

Statistics also bolster proportionality-based objections to discovery. In the first example above, what if the sample showed that the custodian did possess some relevant material, but nothing that would be considered “hot,” and nothing non-duplicative that other custodians did not also possess. You can use the results of your statistical sampling to assert that the cost to collect, process, review and produce this custodian’s documents is not warranted when balanced against the modest gain achieved by producing a few more, not particularly interesting documents.

*A: To satisfy the courts, which increasingly require the use of statistical measurements in discovery.*

Even if improved quality, defensibility, cost, and efficiency don’t persuade litigators to embrace statistics, they may have no choice in the end, as courts are increasingly directing parties to introduce statistical evidence to support their contentions about discovery. Several courts have noted that the defensible use of keyword searches may require the presentation of statistical validation of those searches. For example, *In re Seroquel Prods. Litig.*, 244 F.R.D. 650, 662 (M.D. Fla. 2007), the judge noted that, “while key word searching is a recognized method to winnow relevant documents from large repositories … [c]ommon sense dictates that sampling and other quality assurance techniques must be employed to meet requirements of completeness.”[1]

For litigants looking to use more advanced technological means of searching for and producing relevant documents, including predictive coding, the presentation of statistical support for the output undoubtedly will be required.[2]

For my math-resistant colleagues out there, I hope these answers add up to a compelling rationale. Stay tuned for the next installment, when we’ll discuss which key statistical concepts are used in effective e-discovery.

[1] *See also, e.g.*, *William A. Gross Constr. Assocs., Inc. v. Am. Mfrs. Mut. Ins. Co.*, 256 F.R.D. 134, 134, 136 (S.D.N.Y. 2009) (“[W]here counsel are using keyword searches for retrieval of ESI, … the proposed methodology must be quality control tested to assure accuracy in retrieval and elimination of “false positives.”); *U.S. v. O’Keefe*, 537 F. Supp. 2d 14, 24 (D.D.C. 2008) (“Whether search terms or ‘keywords’ will yield the information sought is a complicated question involving the interplay, at least, of the sciences of computer technology, statistics and linguistics.”); *Victor Stanley v. Creative Pipe, Inc.*, 250 F.R.D. 251, 257 (D. Md. 2008) (“The only prudent way to test the reliability of the keyword search [used to find privileged documents] is to perform some appropriate sampling of the documents determined to be privileged and those determined not to be in order to arrive at a comfort level that the categories are neither over-inclusive nor under-inclusive.”).

[2] *See DaSilva Moore v. Publicis Group*, No. 11 Civ. 1279 (S.D.N.Y.) (M.J. Peck Order Feb. 24, 2012) (approving a protocol for the use of predictive coding where “[t]he accuracy of the search processes, both the systems’ functions and the attorney judgments to train the computer, will be tested and quality controlled by both judgmental and statistical sampling”); *In re: Biomet M2a Magnum Hip Implant Products Liability Litigation (MDL)*, No. 3:12-MD-2391 (N.D. Ind. Apr. 18, 2013) (approving use of keyword searches combined with predictive coding, relying in part on presentation of statistical evidence regarding the efficacy of the methodology).