Analytic data views

An analytic data view defines a structure for accessing data that describes the entities used in predictive models and business rules. The view associates the data structure with physical data sources for the analysis.

Predictive analytics requires data organized in tables with each row corresponding to an entity for which predictions are made. Each column in a table represents a measurable attribute of the entity. Some attributes may be derived by aggregating over the values for another attribute. For example, the rows of a table could represent customers with columns corresponding to the customer name, gender, zip code, and the number of times the customer had a purchase over $500 in the past year. The last column is derived from the customer order history, which is typically stored in one or more related tables.

The predictive analytic process involves using different sets of data throughout the lifecycle of a model. During initial development of a predictive model, you use historic data that often has known outcomes for the event being predicted. To evaluate the model effectiveness and accuracy, you validate a candidate model against different data. After validating the model, you deploy it into production use to generate scores for multiple entities in a batch process or for single entities in a real-time process. If you combine the model with business rules in a decision management process, you use simulated data to validate the results of the combination. However, although the data that is used differs across the model development process stages, each data set must provide the same set of attributes for the model. The attribute set remains constant; the data records being analyzed change.

An analytic data view consists of the following components that address the specialized needs of predictive analytics:
  • A data view schema, or data model, that defines a logical interface for accessing data as a set of attributes organized into related tables. Attributes in the model can be derived from other attributes.
  • One or more data access plans that provide the data model attributes with physical values. You control the data available to the data model by specifying which data access plan is active for a particular application.
Important:
  • Components of the analytic data view are defined by using IBM® SPSS® Modeler streams. Familiarity with IBM SPSS Modeler concepts and experience creating streams are necessary to work with analytic data views.
  • Adapters for IBM SPSS Modeler must be installed for your IBM SPSS Collaboration and Deployment Services Repository to define analytic data view components. For more information about these adapters, see the IBM SPSS Modeler documentation.
Figure 1. Analytic data view
Figure shows two different data access plans containing three IBM SPSS Modeler streams each that provide data for three tables in the data model of an analytic data view.

Figure 1 illustrates an analytic data view that contains two data access plans for a data model. The data model includes three tables, with relationships defined between tables 1 and 2 and between tables 2 and 3. Data access plan 1 associates each table with an IBM SPSS Modeler stream. Data access plan 2 associates the data model tables with three different streams. Under data access plan 1, the model retrieves data from terminal nodes in stream 11, stream 12, and stream 13. Under data access plan 2, the model retrieves data from terminal nodes in stream 21, stream 22, and stream 23. By changing the data access plan in use, you can switch the data available to the model.