Artificial Intelligence, Machine Learning, Analytics, and TAR
As volumes of data grow and become more complex, we rely increasingly on the many available tools and techniques that help us understand our data. “Artificial Intelligence” makes headlines every day, and we talk with our clients and colleagues routinely about “machine learning,” “analytics,” and “technology assisted review” (or “TAR”). But there’s no common language used in these discussions, and confusion often exists over the meanings of these terms. And although the concepts are all related, the terms are not interchangeable. So, in this post—which is the first in a series exploring these topics—I will start by providing some clarity into what these terms mean.
Artificial Intelligence is a broad term for computational techniques that in the past were associated with human reasoning. There are many different types of human reasoning that are approximated by computers, and we generally reserve the term artificial intelligence for reasoning tasks that are more difficult or have not become commonplace in our daily activities. Examples include finding patterns in large data sets, understanding natural language, predicting future events, or providing non-obvious insight into a problem.
Artificial intelligence can be grouped into two areas: general and applied. General artificial intelligence refers to building algorithms that can learn and solve many different, unrelated problems; this is usually what a layman thinks of as artificial intelligence. This is often depicted in popular culture, such as the computer in Star Trek, the androids in Star Wars or the intelligences in The Matrix. Applied artificial intelligence refers to systems built to solve specific problems, and is typically what a practitioner uses when building solutions. Although there is significant work being done in both areas, most of the “AI” we currently encounter on a day-to-day basis are applied artificial intelligence systems— such as ad placement from Google, automated stock-trading algorithms, movie suggestions from Netflix, or some self-driving skills in cars.
Some tasks were considered artificial intelligence in the past (such as searching, playing chess, or optical character recognition), but have become so commonplace as to not warrant the label of artificial intelligence. As time progresses, we tend to remove that designation from capabilities that lose their magic through repeated and widespread use.
Machine Learning refers to a class of algorithms that allow a computer to become more proficient at a task as it gains experience with example data. Machine learning is a subset of artificial intelligence. Specific algorithms are usually applied to solving particular problems, with the computer simulating the human process of thinking about that problem. Examples of machine learning approaches include artificial neural networks, clustering, decision tree learning, and Bayesian networks. Each approach requires a training set as input, which might consist of textual documents, images, or some other type of data. With exposure to larger training sets, accuracy increases.
In the last few years, we have seen the rise of “deep learning” as a way to improve the performance of our machine learning algorithms. Deep learning utilizes multiple layers of abstraction from the training data. Some deep learning systems have many different layers, and each layer can be composed of different algorithms.
Analytics is the process used to find meaningful results or patterns from data. Analytics may use machine learning, but may also rely on general mathematical techniques, and usually involves statistics. When analyzing data, practitioners use feedback loops to test their results, and most of our statistical tools rely on these feedback loops to generate meaningful information. In my view, if a process doesn’t include statistics and a feedback loop, then it really isn’t using analytics.
Analytics is a good umbrella term for the set of tools we use to solve problems and find meaning in our data, including the use of machine learning.
Technology Assisted Review (TAR) has been a buzz-word in the e-discovery space for several years. EDRM.net defines TAR narrowly, as a process by which computer software electronically classifies documents based on input from expert reviewers. However, this term, technology assisted review, is often used more broadly to describe any application of analytics in the process of making decisions about collections of documents. We can use analytics feedback loops across the entire process, and we can use many of the hundreds of different machine learning algorithms available in the broad analytics industry. Without some context, “TAR” is not a particularly useful term, because it basically includes any form of analytics in document review. However, we use it as an umbrella term with practitioners when we describe where and how we apply analytics in our processes.
Here’s a short recap of how I use these related terms—
Analytics is the use of the general power of mathematics, algorithms, and feedback to understand data. Machine learning is a set of algorithms that may be used in our analytics. TAR is the application of analytics to e-discovery and document review processes. And when applied correctly, creatively, and powerfully, analytics may be classified as artificial intelligence because of its nearly magical ability to help us solve our problems.