What’s Next for E-Discovery? Legaltech West Coast Offered Some Insight.

Legaltech West Coast Review

What’s next for e-Discovery in the United States and beyond? Earlier this week at Legaltech West Coast, we assembled three panels of experts—including some of the country’s most well-respected federal magistrate judges—to explore answers to that question. The discussions were lively, thought-provoking, and sometimes funny (even we can’t take e-Discovery too seriously).

The Evolving Forensic Technology Landscape

First, we took a look at current and emerging issues in ESI forensics and collections, including the latest challenges presented by technology advances in computers, systems, mobile devices, and the Internet of Things (IoT). Zach Warren from Legaltech News reported on the panel, moderated by DiscoverReady’s Daniel Blair, VP of Innovative Strategies—you can find Zach’s write-up here. One of the best pieces of advice came from Chris Sitter, eDiscovery & Digital Forensics Senior Manager at Juniper Networks, who explained that every forensic collections project, regardless of the complexity of the technology involved, should start by asking “What question is it we’re looking to answer?” Until the legal team can articulate that question, they won’t be able to form an effective strategy. Ben Robbins, who’s responsible for eDiscovery and Information Governance at LinkedIn, agreed. According to Ben, whether it’s a cutting-edge IoT device, or a more traditional data source, a forensic collection is “still the same exercise, doing information gathering, figuring out where it is, and pulling it out.”

The Latest US-EU Cross-Border Privacy Issues

Next, we examined the current state of US-EU cross-border discovery, and the highly uncertain future for these international data transfers. Moderator Jeanne Somma, DiscoverReady’s Discovery Practice Director, opened the program by joking that, because the cross-border discovery landscape changes so quickly these days, the panelists all asserted their “right to be forgotten” with respect to their remarks. The ambiguity in this area stems from the recent invalidation of Safe Harbor, a lack of consensus on the proposed Privacy Shield replacement, and the looming implementation of the EU’s General Data Protection Regulation. According to David Cohen, Partner and Practice Group Leader for Records and E-Discovery at Reed Smith, at the heart of the matter are broadly different interpretations of privacy on each side of the Atlantic—the EU’s “definition of what is private information is very broad.” Brock Wanless, Managing Counsel for Global Privacy & Regulatory at Groupon, explained that his approach to the uncertain regulatory environment is to develop comprehensive, company-wide policies and procedures that reflect good-faith, sincere efforts to protect the privacy rights of individuals. The ability to point to those efforts can go a long way towards satisfying inquiries by EU data privacy regulators. You can read Ricci Dipshan’s in-depth coverage of the panel for Legaltech News here.

Judicial Perspective: Are the New Federal Rules Having the Intended Effect?

Finally, I moderated a panel that asked three leading federal magistrate judges to share their insights on whether, in the first six months since going into effect, the new amendements to Federal Rules of Civil procedure are accomplishing their stated goals—to make civil discovery more proportionate, cost-effective, efficient, and cooperative. The judges all agreed that the new rules will eventually achieve these goals, but much work still remains to fully educate counsel and litigants on how to implement the revised rules.

To kick off the panel, Magistrate Judge Elizabeth Laporte, of the U.S. District Court for the Northern District of California, walked the audience through the significant amendments to Rule 26(b)(1). That rule now requires discoverable information to be both “relevant to [a] party’s claim or defense” and  “proportional to the needs of the case.” With respect to proportionality, Judge Laporte counseled that “we really mean it; we’re taking this seriously and you should too.” Magistrate Judge Andrew Peck, of the U.S. District Court for the Southern District of New York, believes the new rule is raising awareness of proportionality as intended; he noted that since the amendment, he has seen more decisions and rulings on proportionality “than I saw in the last ten years.” And what about the “doomsayers” who predict that proportionality disputes will devolve into distracting “litigation about litigation?” Magistrate Judge Mitchell Dembin, of the U.S. District Court for the Southern District of California, said they are “full of it,” and with effective, active judicial case management (another goal of the new rule amendments), proportionality can be achieved.

Next, Judge Dembin lauded the amendments to Rule 34(b), which now requires parties to make objections to discovery with “specificity,” state whether any responsive materials are being withheld on the basis of an objection, and provide concrete timelines for document productions. According to Judge Dembin, this “mind-numbingly good” rule amendment will go a long way towards more efficient discovery, with much less gamesmanship. The judges also praised several other rule amendments, including changes to Rules 16 and 26 that encourage earlier, more informal discussion about discovery requests and discovery disputes, and more meaningful conversations during discovery planning about preservation of ESI and protection of privilege through Federal Rule of Evidence 502(d) orders. Never mind that Judge Peck says he’s “kindler, gentler” these days; he still insists that lawyers “commit malpractice” when they fail to take advantage of Rule 502(d), and he hopes the new rule amendments will raise more awareness about 502(d). Judge Laporte agreed, characterizing Rule 502(d) as “free insurance” for privileged materials. For additional coverage of the judges’ panel, and the thoughtful insights they shared, you can read Ricci Dipshan’s coverage in Legaltech News here.

For those of you who joined us during the panels, thanks for your support. If you couldn’t make it, we look forward to seeing  you in 2017 at Legaltech New York!


Richness and Precision and Recall (Oh My!)

Many of our blog readers are familiar with the concepts of richness, precision, and recall. We published a series of posts explaining these statistical measurements (in parts one, two, and three), we hosted a webinar on the subject, and there has been a good deal of education on this topic for eDiscovery practitioners. But most of that discussion and education focused on the context of statistical testing and measurement of techniques used to find documents containing relevant information for discovery—a context in which these statistics are fairly well-settled and easily understood.

However, organizations aren’t just searching their data collections for documents relevant to litigation disputes. It’s becoming more and more critical for companies to find and protect sensitive information—personally identifying information, personal health information, financial and payment information, etc.—stored in their caches of unstructured (non-database) data. What happens when we apply these statistical concepts of richness, precision, and recall to the identification of this sensitive information? The situation becomes a bit more complicated—and the statistics more difficult—which prompted me to write this post.

The Terminology

To help explain the challenge, let’s first define a few relevant terms:

Document – A collection of words, phrases, numbers, characters, or other items all grouped into one unit, such as a text file, word processing document, or spreadsheet.

Entity – A particular type of data that exists in a document. Examples of “sensitive” data entities include social security number, credit card number, username/password combination, and account number. Of course, a document with sensitive entities most likely contains other, non-sensitive entities, such as words, punctuation, metadata, formulas, pictures, and graphs—the list is almost endless.

Element – An individual instance of an entity, such as an individual social security number or particular word.

The Statistics

Our Hypothetical

To illustrate the statistical oddities that can arise in calculating richness, precision, and recall when searching for sensitive data, let’s use a hypothetical example. Imagine a web site customer support rep for a financial institution, who authored ten documents saved on the company’s network. All ten documents were gathered and scanned for sensitive data. In one of those documents, the rep took copious notes regarding hundreds of customer interactions. The sensitive data scan reveals that those notes included the customers’ web site user names, passwords, social security numbers, and account numbers. The other nine documents did not contain any sensitive data.

In this example, the sensitive data document includes four sensitive data entities (user name, password, social security number, and account number), hundreds of sensitive data elements (each instance of one of those entities for a customer), and thousands of non-sensitive data elements (words, punctuation, etc.). Each of the other nine documents includes hundreds of non-sensitive data entities and thousands of non-sensitive data elements.

To make the math easy, assume that in this data set we have:

  • 10 unique documents, with only one document containing sensitive data
  • 100 unique entities, with 4 of those entities being sensitive data
  • 10,000 unique elements, with 700 of those elements being an instance of one of the four sensitive data entities

The Calculations & Measurementsstatistical testing and measurement techniques for eDiscovery

If we assume perfect knowledge of the sensitive data elements, we can calculate the richness of sensitive data. (What if we have imperfect knowledge, or rely on sampling? We’ll save that for a later, more in-depth blog post.) Generally, calculating richness is pretty easy—it’s simply the proportion of the total items that contain the content we’re measuring. But in the context of sensitive data, this is where the calculations get interesting. Is richness measured at the document level, entity level, or element level? In our example, richness of sensitive data at the document level is one out of 10 documents, or 10%. At the entity level, it is 4 out of 100, or 4%. At the element level, it is 700 out of 10,000, or 7%.  So, which is correct? Is richness 10%, 7% or 4%? And to make it even more complicated, I assumed unique documents, entities and elements, but in reality there will be duplication on each level. How should we count duplicate entities (or elements or documents) in our calculations? (We’ll save that one for a later post, too.)

This simple representation of richness considers sensitive data of any type. But what if we want to know the richness of user name, social security number, password, or account number separately? Do we calculate each one at the document level, entity level, and element level? The problem is multiplicative, where now we have three different measurement points for four different sensitive data elements, giving us 12 possible richness measurements.

When we turn to the calculations of precision and recall, the same questions arise and we continue to compound the measurement possibilities. Now we have three measurements (precision, recall, and richness) across four entities (user name, social security number, password, and account number) at three levels of measurement (document, entity and element). That gives us 36 different possible measurements.

Which measurements do we choose? Not surprisingly, the answer is “It depends.” In large part, it depends on what we’re trying to accomplish. In our hypothetical, if the objective of the data scan is to redact every instance of sensitive data and return the sanitized document to the network, then we need to measure at the element level. But if our objective is to wholly remove any document containing sensitive data from the network, then we can measure at the document level.

Richness, precision, and recall–oh my.  We’re not in Kansas anymore.

Some Things Don’t Need to be Discovered. Protect Sensitive Data in Discovery.

Today’s corporate information systems are awash with highly sensitive data. Whether it’s personally identifiable information (“PII”), personal health information (“PHI”), financial and payment information, intellectual property and trade secrets, source code—the list goes on—sensitive information exists in virtually every collection of data. It’s found in expected locations, like organized, well-managed databases; but it’s also found in many unexpected places, like individual email accounts, “personal” folders on employees’ computers, and freely-shared network folders. And as DiscoverReady’s clients can attest, this stew of sensitive information is finding its way into litigation and regulatory discovery.

Why is this a problem? First, various laws and regulations mandate that organizations must take reasonable steps to secure and prevent unauthorized disclosure of PII, PHI, and assorted other kinds of sensitive data. These laws—which number in the hundreds, if not thousands—include federal and state statutes and regulations, laws of international jurisdictions, court rules and procedures, and ethics rules applicable to attorneys. Second, companies likely possess sensitive information that, although not legally protected, is valuable enough to the organization that it should be shielded from discovery. Examples include trade secrets, formulas, source code, and proprietary processes. And in this context, “sensitive” is in the eye of the beholder—what’s important for one company to protect may be wholly uninteresting for another.

Of course, companies must protect sensitive information in the ordinary course of business, while it resides within the corporate environment—an obligation that poses its own challenges. But they also must take steps to secure sensitive information when it leaves the organization and flows into discovery. If sensitive information is compromised during discovery, the consequences can be dire. The company may face legal liability, to government regulators and to aggrieved victims. Its reputation with customers and business partners may suffer. Fines, money damages, and legal fees can mount into the millions. And if a company’s valuable trade secret gets turned over to an adversary, there may be no way to repair that damage.

protect sensitive information in discoveryOver the next few months here in the DiscoverReady blog, we’ll explore this particular discovery problem, and discuss various approaches and solutions. In my post today, I’ll cover the issue from a high level, and introduce some topics we’ll come back to later. I like to think about solutions to the problem in these terms—keep it out; lock it down; find it; and cull it, redact it, or protect it.

Keep it Out

The best way to protect sensitive data in discovery? Keep it out of discovery in the first place. By leveraging (or creating) good information governance practices, the organization can—

  • Minimize the volume of sensitive information it creates and stores,
  • Understand and control where sensitive information resides, and
  • Selectively and strategically decide what sensitive information gets collected for potential discovery.

And speaking of collections. . . By implementing narrow, targeted collections for discovery—rather than “grab it all and sort it out later” collections—organizations can reduce the risk that extraneous, irrelevant sensitive information gets swept up into discovery workflows. This is especially true with respect to personal data kept by employees at work. For example, why grab a folder on an employee’s desktop that contains copies of health insurance claim forms that she scanned and emailed during the work day, when those documents hold no relevance to the litigation? A targeted collection will keep out that highly sensitive, legally protected PII and PHI.

Lock It Down

Sensitive personal information will inevitably end up in discovery—indeed, in many cases that information will be central to the dispute. So before turning over document collections to law firms or discovery service providers, organizations should put appropriate security measures into place. When in transit, data collections should always be encrypted and/or transmitted through secure channels. When at rest—in the control of law firms and other providers—data should be subject to security and privacy protocols that are vetted, approved, and periodically audited by the company.

Find It

In a collection of thousands, if not millions, of records, how do you know which ones contain sensitive information? As volumes of collections grow, in most cases it would be impractical to put human eyes on every document to find sensitive data. Instead, litigants can deploy searches, scans, and increasingly powerful analytics tools to find these types of data. Sometimes a basic keyword search will get the job done—for example, if you’re looking to find instances of a particular person’s name. But simple keyword searches alone will not succeed. For example, finding instances of driver’s license numbers—where the actual numbers are unknown, and where the number can appear in dozens of different formats, unique to each state—requires a much more sophisticated search technique. Likewise, finding and redacting all instances of source code—which many companies treat as highly sensitive—requires specialized searches designed to locate that unique type of content.

Once you’ve leveraged the power of automated searches and scans, how do you know you’ve found all the sensitive information in a collection? First of all, litigants should remove the word “all” from their vocabulary in the context of information search and retrieval. No search, process, or automation tool is perfect, no matter how powerful the technology and no matter what promises its developer makes. Accordingly, parties should not expect to find “all” instances of sensitive data. Rather, the standard is reasonableness—organizations should assess whether, under the circumstances, it took reasonable steps to locate protected information. One key component of that assessment is conducting statistical testing and measurement of the effectiveness of the sensitive data screen. By statistically validating the results, a litigant can defensibly support a reasonable, good-faith effort to protect sensitive information.

Cull It, Redact It, Protect It

Once sensitive data have been located within a document collection, the litigation team can then decide how best to protect it. In some instances, documents containing sensitive information can be culled out of the collection and withheld from discovery entirely. In others, the documents will need to be produced, but protected information can be redacted. Those redactions can be accomplished manually, with human reviewers making the redactions, or automatically, using software tools to accomplish that tedious task. In still other cases, the information is relevant to the matters and must be produced in discovery—in those cases, strong protective orders incorporating specific data security and privacy measures will be important.

So that’s our introduction to the subject of personal and sensitive information in discovery. Follow us here on the eDiscovery blog for more in-depth discussions of the various aspects of this issue in the coming months.

DiscoverReady’s New Year’s Resolutions

Happy New Year! As we turn the calendar to 2016 and reflect on what the year ahead might bring, I asked some of my DiscoverReady colleagues to share some of their resolutions for the new year. Here’s what they resolve to accomplish in 2016. . .

Phil Richards, Chief Technology Officer: I resolve to continue improving our e-discovery workflows to make those processes more closely mirror the science of information retrieval. I also want to do more work with clients to help them better understand their data from a “big picture” perspective. Companies are keeping too much information that has little or no business value, which drives up cost and risk—not just when it comes to discovery, but in other important functions, such as compliance and data security.

Amy Hinzman, EVP Review Operations: One of my resolutions is to persuade more clients and counsel to incorporate advanced analytics into their standard document review workflows. But the reality is that many matters still rely on good ol’ fashioned search terms, and so I also resolve to promote the routine use of our Search Term Optimization process to test and verify search terms, and boost the defensibility of search terms. And I resolve to move further towards “paperless” document review training.

Daniel Blair, VP Innovative Strategies: I resolve to educate more organizations on the need to conduct individualized data security risk assessments, and consider in-place protection for sensitive data. I also intend to encourage companies to explore how they can apply traditional e-discovery solutions in new ways, such as compliance initiatives and data privacy programs.

Sean McMechan, VP Project Management: One of my resolutions is to work towards a mobile app that will make all of our various project management reports available on smart phones. I also resolve to further automate our conflict resolution processes, so they are more seamlessly integrated in our workflows. And I will promote more use of our document repository solutions, which allow for easy reuse and comparison of coding calls across multiple matters—so many of our clients with significant litigation portfolios would benefit from this.

And what about my resolutions? I think Calvin put it nicely:

eDiscovery New Years Resolutions
Copyright © by Bill Watterson

Just kidding, folks. I resolve to ramp up my focus on the intersection of litigation and regulatory discovery with data security and privacy. And I also resolve to get even more creative about helping our clients find opportunities to make discovery more efficient, more effective, and less expensive. To accomplish that resolution, I think I need another one—valuable both in and outside of work—“listen more, talk less.”

The DiscoverReady team wishes you all the best in 2016!

Coming in 2016: Tougher New Data Privacy Rules in the European Union

European Commission and European Parliament officials last week agreed on a new set of data protection laws, intended to strengthen individuals’ privacy rights and create a more consistent set of regulations across the twenty-eight European Union member countries.

According to a press release from the European Parliament,

Data Privacy Rules European Union“The new rules will replace the EU’s current data protection laws which date from 1995, when the internet was still in its infancy, and give citizens more control over their own private information in a digitised world of smart phones, social media, internet banking and global transfers. At the same time they aim to ensure clarity and legal certainty for businesses, so as to boost innovation and the further development of the digital single market.”

Highlights of the new rules include provisions addressing:

  • Clear and affirmative consent to the processing of private data. Consumers will have more control over their private information, as consent must be manifested through some action clearly indicating acceptance of data processing. Silence can not constitute consent.
  • Plain language. The new rules prohibit “small print” privacy policies. Information must be given in clear language before data are collected.
  • Parental consent for children on social media below a certain age. Member states will set their own age limits for the consent requirement, but the limit must be between 13 and 16 years.
  • The right to be forgotten. This right, which will now be codified in the regulations, allows individuals to request that their personal information be deleted from the databases of companies holding it, provided there are no legitimate grounds for retaining it.
  • Breach notification. Companies will be required to inform national regulators within three days of any reported data breach.
  • Fines for violations of the regulations. Regulators may issue fines of up to 4% of companies’ total worldwide revenue for misuse of consumers’ online data, including obtaining information without consent.
  • Coordination among Data Protection Authorities. Cooperation among the national DPAs will be significantly strengthened to ensure consistency and oversight.

Importantly for those of us based in the United States, the new rules will extend to any company that has customers in the EU, even if the company is based elsewhere. The EU’s strict stance on privacy has often put their regulators at odds with American companies, which collect and mine data from social media and other web sites for purposes of advertising. But the tough EU privacy laws reflect a fundamental cultural difference between the U.S. and Europe when it comes to individual privacy; Eurpoeans view their right to data privacy as strongly as Americans view their constitutional right to freedom of speech.

The full Parliament will vote on the new regulations in the spring of 2016, and then member states will have two years to implement the provisions.