Performance/Optimization

Stream rewriting. (stream_rewriting_enabled) Allows the server to optimize streams by rewriting them. For example, the server might push data reduction operations closer to the source node to minimize the size of the dataset as early as possible. Disabling this option is normally recommended only if the optimization causes an error or other unexpected results. This setting overrides the corresponding client optimization setting. If this setting is disabled in the server, then the client cannot enable it. But if it is enabled in the server, the client can choose to disable it.

Parallelism. (max_parallelism) Describes the number of parallel worker threads that SPSS® Modeler is allowed to use when running a stream. Setting this to 0 or any negative number causes IBM® SPSS Modeler to match the number of threads to the number of available processors on the computer; the default value for this option is –1. To turn off parallel processing (for machines with multiple processors), set this option to 1. To allow limited parallel processing, set it to a number smaller than the number of processors on your machine. Note that a hyperthreaded or dual-core processor is treated as two processors.

Buffer size (bytes). (io_buffer_size) Data files transferred from the server to the client are passed through a buffer of this number of bytes.

Cache compression. (cache_compression) An integer value in the range 0 to 9 that controls the compression of cache and other files in the server’s temporary directory. Compression reduces the amount of disk space used, which can be important when space is limited. Compression increases processor time, but this is almost always made up by the reduction in disk access time. Note that only certain caches, those accessed sequentially, can be compressed. This option does not apply to random-access caches, such as those used by the network training algorithms. A value of 0 disables compression entirely. Values from 1 upward provide increasing degrees of compression but with a corresponding cost in access time. The default value is 1; higher values may be needed where disk space is at a premium.

Memory usage multiplier. (memory_usage) Controls the proportion of physical memory allocated for sorting and other in-memory caches. The default is 100, which corresponds to approximately 10% of physical memory. Increase this value to improve sort performance where free memory is available, but be careful of increasing it so high as to cause excessive paging.

Modeling memory limit percentage. (modelling_memory_limit_percentage) Controls the proportion of physical memory allocated for training Kohonen and k-means models. The default is 25%. Increase this value to improve training performance where free memory is available, but be careful of increasing it so high as to cause excessive paging when data spills onto the disk.

Allow modeling memory override. (allow_modelling_memory_override) Enables or disables the Optimize for Speed option in certain modeling nodes. The default is enabled. This option allows the modeling algorithm to claim all available memory, bypassing the percentage limit option. You may want to disable this if you need to share memory resources on the server machine.

Maximum and minimum server port. (max_server_port and min_server_port) Specifies the range of port numbers that can be used for the additional socket connections between client and server that are required for interactive models and stream execution. These require the server to listen on another port; not restricting the range could cause problems for users on systems with firewalls. Default value for both is –1, meaning "no restriction." Thus, for example, to set the server to listen on port 8000 or above, you would set min_server_port to 8000 and max_server_port to –1.

Note that you must open additional ports over the main server port to open or execute a stream, and correspondingly more ports if you want to open or execute concurrent streams. This is required in order to capture feedback from the stream execution.

By default, IBM SPSS Modeler will use any open port that is available; if it does not find one (for example, if they are all closed by a firewall), an error is displayed when you execute the stream. To configure the range of ports, IBM SPSS Modeler will need two open ports (in addition to the main server port) available per concurrent stream, plus 3 additional ports for each ODBC connection from within any connected client (2 ports for the ODBC connection for the duration of that ODBC connection, and an additional temporary port for authentication).

Note: An ODBC connection is an entry in the database connections list, and can be shared between multiple database nodes specified with the same database connection.

Note: It is possible that the authentication ports can be shared if the connections are made at different times).

Note: Best practice dictates that the same ports should be used to communicate with both IBM SPSS Collaboration and Deployment Services and SPSS Modeler Client. These can be set as max_server_port and min_server_port.

Note: If you change these parameters, you need to restart SPSS Modeler Server for the change to take effect.

Array fetch optimization. (sql_row_array_size) Controls the way that SPSS Modeler Server fetches data from the ODBC datasource. The default value is 1, which fetches a single row at a time. Increasing this value causes the server to read the information in larger chunks, fetching the specified number of rows into an array. With some operating system/database combinations, this can result in improvements to the performance of SELECT statements.