How SQL generation works

The initial fragments of a stream leading from the database source nodes are the main targets for SQL generation. When a node is encountered that cannot be compiled to SQL, the data are extracted from the database and subsequent processing is performed by IBM® SPSS® Modeler Server.

During stream preparation and prior to execution, the SQL generation process happens as follows:

Where Improvements Occur

SQL optimization improves performance in a number of data operations:

  • Joins (merge by key). Join operations can increase optimization within databases.
  • Aggregation. The Aggregate, Distribution, and Web nodes all use aggregation to produce their results. Summarized data uses considerably less bandwidth than the original data.
  • Selection. Choosing records based on certain criteria reduces the quantity of records.
  • Sorting. Sorting records is a resource-intensive activity that is performed more efficiently in a database.
  • Field derivation. New fields are generated more efficiently in a database.
  • Field projection. IBM SPSS Modeler Server extracts only fields that are required for subsequent processing from the database, which minimizes bandwidth and memory requirements. The same is also true for superfluous fields in flat files: although the server must read the superfluous fields, it does not allocate any storage for them.
  • Scoring. SQL can be generated from decision trees, rulesets, linear regression, and factor-generated models.