SQL Optimization

For best performance, you should always try to maximize the amount of SQL generated to exploit the performance and scalability of the database. Only the parts of the stream that cannot be compiled to SQL should be executed within IBM® SPSS® Modeler Server. For more information, see SQL optimization.

Uploading File-Based Data

Data that is not stored in a database cannot benefit from SQL optimization. If the data you want to analyze is not already in a database, you can upload it using a Database Output node. You can also use this node to store intermediate data sets from data preparation and the results of deployment.

IBM SPSS Modeler can interface with the external loaders for many common database systems. Several scripts are included with the software and are available (with documentation) in the /scripts subdirectory under your IBM SPSS Modeler installation folder.

The following table shows the potential performance benefit of bulk-loading. The figures show the elapsed time to export 250,000 records and 21 fields to an Oracle database. The external loader was Oracle’s sqlldr utility.

Table 1. Performance benefit of bulk-loading
Export option Time (in seconds)
Default (ODBC) 409
Bulk-load via ODBC 52
Bulk-load via external loader 33