As you may have noticed, over the last several weeks in the e-discovery blogosphere there’s been no shortage of posts looking back at developments in 2014 and making predictions about what’s ahead in 2015. So I resisted the temptation to make my own list. Instead, I decided to spend some time doing a deeper dive on one particular subject that I believe will gain more attention in 2015 and in years to come: the impact on legal discovery of increasingly advanced tools for machine learning and artificial intelligence, which enhance our ability to effectively search and analyze large collections of information.
I became curious about this topic after reading a fascinating article by Kevin Kelly in Wired magazine last fall. Kelly suggests that we are on the brink of a revolution in artificial intelligence, brought about by three breakthroughs: (1) inexpensive parallel computation (which gives us the ability to create extremely complex electronic neural networks), (2) big data (vast stores of information that AI systems draw upon to “learn” from), and (3) better algorithms (improved computer code for deep machine learning). Then a couple months later I came across a post by Anthony Wing Kosner on the Forbes website, which focused on the implications of advances in AI to human work and productivity. And finally, I read about Pivotal, a company promoting its “Big Data Suite,” which it promises will allow us to “Store Everything. Analyze Anything.”
My head began to spin. . . “Store everything?” Isn’t that antithetical to the advice e-discovery lawyers have been giving our corporate clients for years? And if machine learning takes great leaps and bounds, with complex computer neural networks threatening to replace 80% of the work done by humans in the developed world (as Kosner posits), what will become of the attorney’s role in analyzing information in discovery? If soon we’ll be able to use software to “analyze anything,” will the resolution of legal disputes largely be driven by which party has the best analytics technology? To help slow the head-spinning, I decided to use this first blog post of 2015 to explore these issues.
Let’s assume that organizations will create and accumulate increasing amounts of information every year, and the trend line of exponential data growth continues its ascent. Let’s also assume that sometime relatively soon – likely not this year, but maybe in three or five or ten years – the technology available to us as lawyers will indeed allow us to “find anything” in a vast collection of electronic documents. In that context, how will machine learning and artificial intelligence change legal discovery?
I don’t pretend to have an authoritative answer to that question. But I offer some observations – and a few more questions – that I hope will provoke further thought and discussion as we launch into 2015.
Lawyers Must Embrace Technology
My first observation relates to the assumption about the capabilities of technology on the horizon. Having the technology available is one thing. Understanding how to use it effectively in litigation discovery is another. As technology advances, so must our willingness as lawyers to learn new skills and stay abreast of how that technology impacts our practice.
Litigation Teams Will Continue to Review Substantial Collections of Documents
Next, it’s important to recognize that finding information is not the same as analyzing and understanding it – at least not from the human’s perspective. From the machine’s perspective, though, those concepts significantly overlap. For a predictive algorithm to “find” relevant documents in a large collection, it must “understand” the concept of relevance in the matter, which it does by “analyzing” the document collection and whatever inputs the humans provide. (As Kosner notes, however, there remain limitations on what computers can understand. “Many prediction algorithms are pretty good now at general sentiment analysis but still stumble on irony and some types of negation and ambiguity.”)
But for the humans working on a litigation matter, once the computer has found the potentially relevant documents, much work still remains. The team must fully understand the substantive content of the documents and determine how to use them most effectively in the matter. To accomplish that, humans must review the documents. The algorithms can help us with that task by sorting, organizing, and presenting the content of the relevant information in useful ways. But the computers can’t do all the work! The machines can’t litigate for us. Ultimately, we must digest and understand the information. Human judgment and legal analysis are still necessary.
Of course, the goal in using artificial intelligence in legal discovery is to greatly reduce the number of documents humans must look at. But as the volumes of information generated and stored by organizations grow, so will even the very small subsets of information relevant to our legal disputes. Some commentators use the term “data lake” to refer to the huge collection of data held by an organization. In the litigation arena, lawyers must find a way to leverage AI technology to reduce that lake down to a pond – or even a puddle – of documents and data that are truly useful to the dispute. It’s that puddle of documents that we’ll use in depositions, attach to briefs, and present as evidence in hearings and at trial. (Law is somewhat unique in its need to distill the data lakes down to much smaller collections. Many other industries and business sectors want the data collections to grow even larger; the more data the better. Big data is big money.)
Data Privacy and Security Concerns Will Not Abate
As organizations accumulate more and more information, seeking to monetize their data lakes by leveraging big data analytics, concerns around protecting the privacy and security of sensitive information will continue to grow. Certainly the advances in AI will help the machines get better at identifying and isolating private information. But the task falls to the humans – in particular, the lawyers and lawmakers – to ensure that we place adequate protections around that information.
With every data breach and newsworthy information hack, the demand grows for better protection of our data. Hopefully in the coming years we will take advantage of breakthroughs in machine learning to find new, more effective ways of implementing those protections.
We Must Heighten Our Focus on Our Uniquely Human Qualities
At the conclusion of Kelly’s article, he offers some elegant observations about the line between man and machine, and how that line shifts with advances in AI:
Over the past 60 years, as mechanical processes have replicated behaviors and talents we thought were unique to humans, we’ve had to change our minds about what sets us apart. As we invent more species of AI, we will be forced to surrender more of what is supposedly unique about humans. We’ll spend the next decade—indeed, perhaps the next century—in a permanent identity crisis, constantly asking ourselves what humans are for. In the grandest irony of all, the greatest benefit of an everyday, utilitarian AI will not be increased productivity or an economics of abundance or a new way of doing science—although all those will happen. The greatest benefit of the arrival of artificial intelligence is that AIs will help define humanity. We need AIs to tell us who we are.
Intelligent machines might also help lawyers understand who we are – and what we should strive to be. Unlike computers, lawyers have the capacity to be ethical and moral; to be empathetic and compassionate; to cooperate and compromise; to seek justice and equality and diversity; and to laugh at ourselves and with our colleagues and clients. As the machines do more and more for us, let’s not forget about the things only we can do.