Managing data sources
Use the Data tab to define data sources for analysis, simulation and testing, scoring, or other use. Data sets can be configured by your administrator, or you can add your own.
The tab includes a Source tab and a Derived tab.The Source tab is divided into these main areas:
- Project Data Model. Defines the fields required by the application. All other data sets are mapped relative to this source. The Data Source Fields section lists the input fields and types your project uses. If desired, use the Derived tab to add expressions or model output from different data sources to extend your project data model.
- Data Sources. The Project Data Sources section lists the data sources that have been saved for use with the current project, model, or rule. The My Data Sources section lists data sources you have defined or saved. Data sets from this list can be copied to or from different projects, and then promoted for use as the project data model, making it possible to share data across applications.
- Secondary Tables. Use this section to add tables from different data sources to extend your project data model, if desired.
Project data model
The project data model acts like a template listing the input fields and types your project will use.
Data source. Displays the data source selected for the application.
Entity ID. If desired, you can select a field to use as the entity ID, to enable SQL push-back. This can improve performance when the data originates in a database and if the selected field is an integer. In IBM® SPSS® Modeler Advantage, the field selected here will be selected by default as an output from a batch score and deselected by default as an input to a model build. This field is optional.
Operational. A check in this box shows that the data in the field is available for use in rules and expressions, and for predictions. Remove the check box if the field is to be used as the target field for analysis. For example, if you are running a query to see how well customers will respond to a direct mail campaign, the results field would be analytical since you do not know the response values in advance.
Field name. Displays the name of each field, as listed in the data source, along with an icon that identifies its storage type.
Measurement. Displays the measurement type of each field in the data source. If you change any of the measurement types, you must refresh the data scan to ensure data compatibility. See the topic Measurement levels for more information.
Values. Lists the values for each field in the data source. For example, maximum and minimum values in a range. For flag fields, hold the mouse pointer over the value to display a tool tip that indicates the "true" and "false" values defined for the field. You can also click a value to edit it.
Typically, the project data model defines the inputs as a set of fields coming from a single logical table. This is always the case for IBM SPSS Modeler Advantage. However, for other applications, the project data model may consist of a primary table along with 0 or more secondary tables. These secondary tables are defined by the secondary data sources that are associated with the primary data source.
Project data sources
Project data sources may include those predefined by the administrator, or added by users. Optionally, the administrator may have locked one or more data sources to prevent users from modifying or removing them, or locked all data options so users can't create new data sources.
If a data source's inputs don't directly match those of the project data model, you can map the former to the latter and fix the discrepancy. For example, if the project data model requires a field named purchase with values Yes and No (measurement level flag), then any data source used must have a comparable field that can be mapped accordingly.
Name. Displays the data source name and shows an icon that identifies its file type.
No. of records. Click the icon in this column to show the record count for any data source in the table. The count is shown beside the icon.
Preview. Click the icon in this column to preview a sample of the data contained in the source. For more information, see Previewing data.
Overview. Click the icon in this column for an overview of the data source. For more information, see Data overview.
Compatible. Either displays a note that the data source is used as the project data model, or shows a green, orange, or red ball to indicate how compatible the data source is with the project data model.
- A green ball shows that the data source is operationally compatible with the project data model data source. An operationally compatible data source is one that includes all the operational fields of the project data model, but can have additional fields. This data source is suitable for rules, scoring, simulation, and test operations.
- An orange ball shows that the data source has at least one field that is compatible with the project data model, with the same name and type. This data source may also have additional fields, and is suitable for building and evaluating models.
- A red ball shows that the data source is incompatible with the project data model, and fields must be mapped before it can be used in the application. An incompatible data source is one which has at least one field whose type is incompatible with the equivalent project data model type.
Map fields. This option allows you to compare the data source fields with those in the project data model and map or unmap any compatible fields to match those required by the project data model. For more information, see Mapping fields.
Copy. Copies the data source to the My Data Sources area.
My data sources
Data sources on this list are saved with your user account, so they are available whenever you log in, and can be copied to any project, model, or rule that you open (assuming your administrator gives you authority to do so). Fields in this part of the tab work in the same way as those in the Project Data Sources area; although there is no Compatibility column.
The Copy column enables you to copy the data source into the Project Data Sources area.
Secondary tables
To add tables from different data sources, click Add/Edit Secondary Tables. Secondary tables can be used for dynamic table-based allocation, or for extending your project data model, enabling model outputs to be added with different data sources. For more information, see Adding secondary tables.
Working with data sources
- To add a new data source to either the Project Data Sources or My Data Sources lists, select Add a data source. For more information, see Creating a new data source.
- To change the measurement level, or type, of a field in the data source in the Project Data Model, select the relevant level. For more information, see Measurement levels.
- To copy data sources to or from the Project Data Sources list, click the appropriate arrow in the Copy column.
- To map field names for a data source to the project data model, click the appropriate link under the Compatible column. (Once field names are mapped, the link is no longer displayed.) For more information, see Mapping fields.
- To preview a data source, click the Preview icon. For more information, see Previewing data.
- To add secondary tables to extend your project data model, click Add/Edit Secondary Tables in the Secondary Tables section. For more information, see Adding secondary tables.
- To add additional fields (expressions, fields using segment rules, or model output from different data sources) to extend your project data model, click the Derived tab. For more information, see Deriving fields.