JeanFrancoisPuget 2700028FGP Comments (8) Visits (25409)
A great way to explain the value of analytics is to speak about the analytics maturity model. This model contains two pieces. First, analytics is a two step process: insights are generated from data, then decisions are made based on these insights. Second, we distinguish four maturity levels, depending on how much of the analytics process is automated: descriptive, diagnostic, predictive, and prescriptive.
Many people tried to display this maturity model as a two dimension map, see for instance here and here, but I never was much convinced. Indeed, the two dimensions always seem quite correlated, and a single linear progression seems to capture the essence. At the same time I was bothered by the fact that my favorite analytics technique, namely optimization, was only applicable as the last step of the maturity model, inside the prescriptive analytics level. As a matter of fact I have advocated that starting directly with prescriptive analytics could lead better return on investment than with other analytics. How can I get rid of this imposed ordering on when each analytics should be used?
The light came via this interesting post from Steve Schultz. He introduced time as one dimension of his analytics framework. This is a brilliant idea. Here is how I am using it now. Given we move to a 2D map of various analytics, this is more an analytics landscape than a maturity model. As for Steve, time is one dimension, refereeing to the focus of the analytics work. We distinguish past, present, and future. Progress towards decisions is the other dimension. The higher on this dimension, the less work human has to do to get decisions. My axes are similar to Steve Schultz', but the content is rather different. First of all, optimization is in there, whereas Steve did not include it. There are other significant differences, but I'll leave listing them as a little exercise to readers.
Note that the four types of analytics can be combined into a single solution. For instance, a fraud and financial crime management solution answers: How can I better predict (predictive), detect (diagnostic), and investigate (prescriptive) fraud and financial crime? Similarly, a supply chain optimization solution answers: How can I utilize downstream and upstream data (descriptive) and insights (diagnostic) to anticipate (predictive), control, and react (prescriptive) to my supply chain?
The picture below provides more details on the analytics landscape model. It can certainly be improved and I more than welcome comments and suggestions for change. I used larger font for machine learning and statistic because they span a large fraction of the insight row. Note also that I added a planning piece at the bottom right. This is to represent tools that support manual decision making. This includes generic purpose spreadsheets as well as enterprise platforms such as IBM TM1. It is on the right because it is about future actions. It is at the bottom because all work towards decisions is performed by human.
Note that, as every model, it is a simplification of reality. For instance statistics can be used in descriptive analytics (think of mean, median, percentiles, variance of data). Optimization is relevant to diagnostic analysis as most model fitting can be cast as an optimization problem, see Convex Optimization for instance. The landscapel does not show combinations of techniques either. We tried to focus on the main use cases for each technique in order to keep the model simple.
There is one simplification we didn't make though. Diagnostic and predictive analytics are two sides of the same coin using the same core technologies from statistics and machine learning. Many just call both of them predictive analytics. I prefer to split them as it captures the process of learning from past to predict future, see picture below. Diagnostic analytics is where analytics models are fit to training data, while predictive is where these models are used to transform data into new data. A typical use case is to fit (or train) a scoring model on a large corpus of history data, then apply this model repeatedly to incoming events and score them on the fly. For example, events could be customer interactions, and scores could be about the propensity to buy some product, or the likelihood to leave (churn) for competition, etc.
When presenting the above landscape we often get these questions: What about cognitive computing? What about big data? What about data science?
The answer to the first question is rather simple: cognitive computing is not another analytics type. It is the extension of existing analytics types to new form of data, namely natural language. I expanded on this view in How Does Cognitive Computing Relate To Analytics?
In order to answer the second question, let us define big data first. IMHO, Big data should be called all data because it refers to the processing of all available data. That data can come in very large volume, it can come in a variety of form, it can come at high velocity, and its veracity can be in doubt. As a matter of fact, big data is often defi
History data can be quite large. For instance, I have worked on a diagnostic analytics application where we dealt with about 120 terabytes representing 3 year of history. We had to process that amount of data many times a day as it gets updated continuously. These volume aren't extreme by far. For instance, CERN processes one petabyte per day. We therefore map large volume to history.
Variety is pervasive. Input data, insights, and decisions can span a variety of forms, hence we map variety to all three.
Operational data represents the current state of the business we apply analytics to. It naturally comes as a stream of events and updates. These are then processed by predictive analytics (e.g. get scored), or by prescriptive analytics (e.g. by triggering rules). It can also be processed to provide real time dashboard . Therefore, high velocity data plays across descriptive, predictive, and prescriptive analytics when they deal with present data.
Predictive and prescriptive analytics create data about the future. That data is uncertain, by nature. Its veracity is in doubt. Therefore we map veracity to prescriptive and predictive analytics when they deal with future. Granted, we can also have veracity issues also in history and operational data. However, some of the history and operational data is certain, while all future data is uncertain.
Our last question is about data science. I advocated that the
Viewed that way, it is pretty close to the analytics process that goes from data to insight to decisions. We could say that analytics and data science are interchangeable. However, I would argue that most of descriptive analytics isn't data science, except for advanced data visualization.
Update on Nov 2, 2015. Changed title to Analytics Landscape as suggested by Irv Lustig in his comment. Edited content to remove references to analytics maturity model.