Sensitive Information in Unstructured Data: A Corporate Blind Spot?
Consider this scenario described in a recent Forbes blog post:
“Every quarter, a PR department receives the final quarterly financial numbers via email ahead of the earnings announcement in order to prepare a press release. The PR draft will be shared via email by a select group within the company before being approved and ready to be distributed out on the news wires. When pulling that financial information from the ERP system — a system that usually lives behind the corporate firewall with strong security and identity controls in place and with business owners who govern access to the systems and data within — we’ve instantly taken that formerly safe data and shared it freely by email as an Excel file.”
Sound familiar? It probably does, as similar scenarios play out every day in most corporate enterprises. These data “de-structuring” events often reflect perfectly acceptable business practices—or simply the operational realities of how work gets done. Despite the commonality, these events create enormous risk for the enterprise, due to the relative vulnerability of email and files stored locally on laptops as compared to a critical enterprise system of record.
Help! I think my data have been de-structured! What do I do now?
First, find and analyze your data. Are the de-structured data in email, on laptops, or on other networked file locations? Next, identify who can access the de-structured data and who is accessing the data. Are the (typically stringent) security controls for the source information system replicated on file shares or personal devices?
Risk increases when the data contains sensitive information like personally identifiable information (PII), protected health information (PHI), trade secrets, financial or medical account numbers, or material non-public information (MNPI), so you’ll need to analyze both data context and content. Too many technology-based approaches to locating and remediating sensitive data yield incomplete or inaccurate results, because the available tools focus too narrowly on structured data, and on data context, such as file age, location, access dates, and other metadata. If your approach doesn’t consider data content, you’re missing potentially critical information. (Although as we’ve discussed in prior blog posts about data breaches of highly sensitive information, in some circumstances context is more important than content.)
Next, categorize and remediate the data. Create appropriate document categories and/or work with Information Governance to utilize those already in place. Data categorized as ROT or non-critical business can be queued for deletion. Depending on the types of elements located, sensitive data can be quarantined, redacted, tokenized, or otherwise masked.
Additionally, you’ll want to assign a data owner to the unstructured data. Enterprise systems typically have clearly defined data owners or managers. Extending the Forbes example, the ERP system may be managed by IT and utilized by various business units during reporting cycles, but the information would likely be owned by a member of the CFO organization. Does that ownership extend to de-structured data that was emailed to the PR team? If yes, how can your enterprise use the results of the analysis to provide these data owners with enough information to make informed decisions?
Many software solutions being sold on the market today promise to find your sensitive data. In our experience, however, these tools generate results that are both over-inclusive—pulling in false results that send you on a wild goose chase—and under-inclusive—giving you a false sense of security that you’ve found all sensitive data, when in fact you’ve missed a lot. Aggregating and analyzing both data content and context allows corporations to more effectively manage and remediate their enterprise data. Understanding what is contained in your unstructured data can drive informed Information Security and Governance policy and action.