Stream performance

Many factors can impact how your SPSS® Modeler streams perform.

Keep these general tips in mind:

Where possible, consider minimizing the size of your data by limiting processing to only those fields that are needed by using Filter nodes and the Filter tab in source nodes.
Leverage in-database processing capability whenever possible to reduce the amount of data pulled in to SPSS Modeler.
Minimize the network distance between your IBM® SPSS Modeler Server and the source data.
Certain data sources require more overhead than others. For example, the Excel source node takes longer to access the same data than a CSV file. XML data is inherently wasteful and shouldn't be used for storing large amounts of data.
If using Python-based nodes or R-based nodes, note that there are internal data transfers that must take place. This can sometimes slow processing.
Accomplishing your tasks with the fewest number of nodes is usually preferable to more nodes.
Use Type nodes only when necessary. This is especially true when Hadoop is the data source because each Type node processes the entire data flow. See What is instantiation?.
Certain statistical modeling nodes might be slow, especially with data sets that have many categorical fields.
Changing the order of nodes can influence processing speed, so experiment with node order. For example, if you have a stream with nodes that reduce data by subsetting or reducing the number of fields, move them as early in the stream as possible.
If a modeling node you're using has a corresponding -AS version, use the -AS node instead because it's multi-threaded and can improve processing.