This article on diagnostic analytics is the third in a series of guest posts written by Dan Vesset, Group Vice President of the Analytics and Information Management market research and advisory practice at IDC.
Analytics solutions ultimately aim to provide better decision support — so that humans can make better decisions augmented by relevant information. Decision support capabilities can be segmented into five related categories, each of which is deployed to answer different types of questions:
- Planning analytics: What is our plan?
- Descriptive analytics: What happened?
- Diagnostic analytics: Why did it happen?
- Predictive analytics: What will happen next?
- Prescriptive analytics: What should be done about it?
In this series of blog posts, we’ll address each of these analytics capabilities. For a fuller introduction to the topic as a whole, see the first post in the series. This third post will focus on descriptive analytics.
Diagnostic analytics: Why did it happen?
In the previous blog post on descriptive analytics, we highlighted the importance of developing and governing KPIs and presenting them to all relevant decision makers. These KPIs establish a common, data-driven language within the enterprise and identify trends, thresholds, anomalies, and other metrics that lead to the subsequent question of why something happened. In fact, most business intelligence (BI) solutions, many with a multi-decade legacy, still stop at simply presenting the information about a KPI to person, without giving the context of why. Knowing what happened is important, but it is not enough to make a confident decision.
The functions of diagnostic analytics fall broadly into three categories:
- Identify anomalies: Based on the results of descriptive analysis, analysts must identify areas that require further study because they raise questions that cannot be answered simply by looking at the data. These could include questions like why sales have increased in a region where there was no change in marketing, or why there was a sudden change in traffic to a website without an obvious cause.
- Drill into the analytics (discovery): Analysts must identify the data sources that will help them explain these anomalies. Often, this step requires analysts to look for patterns outside the existing data sets, and it might require pulling in data from external sources to identify correlations and determine if any of them are causal in nature.
- Determine causal relationships: Hidden relationships are uncovered by looking at events that might have resulted in the identified anomalies. Probability theory, regression analysis, filtering, and time-series data analytics can all be useful for uncovering hidden stories in the data.
In the past, all of these functions would be completely manual; they would rely on the abilities of an analyst to identify anomalies, detect patters, and determine relationships. In that setting, a few of the most experienced analysts would outperform their peers. However, even those top analysts wouldn’t be able to guarantee consistency or results. As data volume, variety, and velocity has increased, such purely manual efforts for diagnostic analytics are no longer feasible.
Modern solutions for diagnostic analytics must employ machine learning techniques to augment the analysts. Machines are infinitely more capable at recognising patterns, detecting anomalies, surfacing ‘unusual’ events, and identifying drivers of KPIs. The latter capability requires application of different analytical techniques, chosen from a portfolio of algorithms, to determine causation and identify independent variables that enterprises can adjust to effect positive change.
Enabled by machine learning, diagnostic analytics serve an important function in reducing unintentional bias and misinterpretation of correlation as causation. Yet today’s diagnostic analytics must still be governed by people. Just as machines can be used to help reduce the bias in human decision making, so should people be used to contextualize the outputs of machine decision making.
IDC predicts that by 2021 25% of large enterprises will have supplemented data scientists with data ethnographers to provide contextual interpretations of data by using qualitative research methods that uncover people’s emotions, stories, and perceptions of their world.
Diagnostic analytics that are based on the combination of AI-infused software and the domain expertise of people promise to be the most effective means for answering the question: Why did it happen? Once you know the why, then you can move forward to answering the question: What will happen next?
For IBM’s view on the Analytics Cycle, check out our smartpaper, “How Can You Trust Your Data Without the Big Picture?”