100 Percent Chance You’ll Learn Something From These Sampling Experts

Earlier this week I was lucky enough to moderate a DiscoverReady-sponsored webinar entitled “A Practitioner’s Guide to Statistical Sampling in E-Discovery.” Panelists included the always outstanding Maura Grossman, of Wachtell Lipton; Gordon Cormack, professor at the David R. Cheriton School of Computer Science at the University of Waterloo, and our own Maureen O’Neill, who heads up our Silicon Valley market area. Maura and Gordon are nationally recognized experts in the field of e-discovery and search and retrieval, and have recently been on the vanguard of advancing the use of statistical sampling in e-discovery.

Although the entire presentation merits a listen, a couple of high points particularly grabbed my attention:

  • Gordon did a terrific job of pointing out that generic claims of “simplified statistics” can be misleading.  He gave a couple examples of “gotchas,” including the one that we have found most common — troubles associated with calculating confidence interval (if you don’t know the term, you should probably listen to the webinar) when the prevalence of relevant documents in the data set (a.k.a. “richness”) is very low.  If you have not had this experience yet, trust us — and Gordon — when we say it’s a topic worth considering.
  • On a related note, Maura made the terrific point of encouraging parties to be very careful in committing to a specific standard, in the form of confidence level, margin of error, etc., before they understand the characteristics of their data set and the substantive impact such a commitment could have.  As a way of example, Maureen O’Neill walked through a hypothetical matter, showing the sample sizes required to achieve certain confidence levels assuming that the data set contained roughly even amounts of both relevant and irrelevant data.  What’s amazing is the difference in sample sizes that can be required (often stated in tens of thousands of documents) when the prevalence of relevant data drops drastically.

These are just a couple of the fascinating and insightful points that arose during our hour-long session: Practitioner’s Guide to Statistical Sampling in E-Discovery.

If you have any interest in the use of statistical sampling — perhaps in measuring the efficacy of search terms or in the defense of the use of predictive coding — I hope that you’ll take an hour to benefit from the wisdom of this outstanding panel.

Maureen O'Neill