Today’s corporate information systems are awash with highly sensitive data. Whether it’s personally identifiable information (“PII”), personal health information (“PHI”), financial and payment information, intellectual property and trade secrets, source code—the list goes on—sensitive information exists in virtually every collection of data. It’s found in expected locations, like organized, well-managed databases; but it’s also found in many unexpected places, like individual email accounts, “personal” folders on employees’ computers, and freely-shared network folders. And as DiscoverReady’s clients can attest, this stew of sensitive information is finding its way into litigation and regulatory discovery.
Why is this a problem? First, various laws and regulations mandate that organizations must take reasonable steps to secure and prevent unauthorized disclosure of PII, PHI, and assorted other kinds of sensitive data. These laws—which number in the hundreds, if not thousands—include federal and state statutes and regulations, laws of international jurisdictions, court rules and procedures, and ethics rules applicable to attorneys. Second, companies likely possess sensitive information that, although not legally protected, is valuable enough to the organization that it should be shielded from discovery. Examples include trade secrets, formulas, source code, and proprietary processes. And in this context, “sensitive” is in the eye of the beholder—what’s important for one company to protect may be wholly uninteresting for another.
Of course, companies must protect sensitive information in the ordinary course of business, while it resides within the corporate environment—an obligation that poses its own challenges. But they also must take steps to secure sensitive information when it leaves the organization and flows into discovery. If sensitive information is compromised during discovery, the consequences can be dire. The company may face legal liability, to government regulators and to aggrieved victims. Its reputation with customers and business partners may suffer. Fines, money damages, and legal fees can mount into the millions. And if a company’s valuable trade secret gets turned over to an adversary, there may be no way to repair that damage.
Over the next few months here in the DiscoverReady blog, we’ll explore this particular discovery problem, and discuss various approaches and solutions. In my post today, I’ll cover the issue from a high level, and introduce some topics we’ll come back to later. I like to think about solutions to the problem in these terms—keep it out; lock it down; find it; and cull it, redact it, or protect it.
Keep it Out
The best way to protect sensitive data in discovery? Keep it out of discovery in the first place. By leveraging (or creating) good information governance practices, the organization can—
- Minimize the volume of sensitive information it creates and stores,
- Understand and control where sensitive information resides, and
- Selectively and strategically decide what sensitive information gets collected for potential discovery.
And speaking of collections. . . By implementing narrow, targeted collections for discovery—rather than “grab it all and sort it out later” collections—organizations can reduce the risk that extraneous, irrelevant sensitive information gets swept up into discovery workflows. This is especially true with respect to personal data kept by employees at work. For example, why grab a folder on an employee’s desktop that contains copies of health insurance claim forms that she scanned and emailed during the work day, when those documents hold no relevance to the litigation? A targeted collection will keep out that highly sensitive, legally protected PII and PHI.
Lock It Down
Sensitive personal information will inevitably end up in discovery—indeed, in many cases that information will be central to the dispute. So before turning over document collections to law firms or discovery service providers, organizations should put appropriate security measures into place. When in transit, data collections should always be encrypted and/or transmitted through secure channels. When at rest—in the control of law firms and other providers—data should be subject to security and privacy protocols that are vetted, approved, and periodically audited by the company.
In a collection of thousands, if not millions, of records, how do you know which ones contain sensitive information? As volumes of collections grow, in most cases it would be impractical to put human eyes on every document to find sensitive data. Instead, litigants can deploy searches, scans, and increasingly powerful analytics tools to find these types of data. Sometimes a basic keyword search will get the job done—for example, if you’re looking to find instances of a particular person’s name. But simple keyword searches alone will not succeed. For example, finding instances of driver’s license numbers—where the actual numbers are unknown, and where the number can appear in dozens of different formats, unique to each state—requires a much more sophisticated search technique. Likewise, finding and redacting all instances of source code—which many companies treat as highly sensitive—requires specialized searches designed to locate that unique type of content.
Once you’ve leveraged the power of automated searches and scans, how do you know you’ve found all the sensitive information in a collection? First of all, litigants should remove the word “all” from their vocabulary in the context of information search and retrieval. No search, process, or automation tool is perfect, no matter how powerful the technology and no matter what promises its developer makes. Accordingly, parties should not expect to find “all” instances of sensitive data. Rather, the standard is reasonableness—organizations should assess whether, under the circumstances, it took reasonable steps to locate protected information. One key component of that assessment is conducting statistical testing and measurement of the effectiveness of the sensitive data screen. By statistically validating the results, a litigant can defensibly support a reasonable, good-faith effort to protect sensitive information.
Cull It, Redact It, Protect It
Once sensitive data have been located within a document collection, the litigation team can then decide how best to protect it. In some instances, documents containing sensitive information can be culled out of the collection and withheld from discovery entirely. In others, the documents will need to be produced, but protected information can be redacted. Those redactions can be accomplished manually, with human reviewers making the redactions, or automatically, using software tools to accomplish that tedious task. In still other cases, the information is relevant to the matters and must be produced in discovery—in those cases, strong protective orders incorporating specific data security and privacy measures will be important.
So that’s our introduction to the subject of personal and sensitive information in discovery. Follow us here on the eDiscovery blog for more in-depth discussions of the various aspects of this issue in the coming months.