Configuring system resources for SQL DI

Preparing for installation involves obtaining the SQL Data Insights (SQL DI) product code and readying your system environment. SQL DI has specific system, network, user access, and security requirements. You must satisfy these requirements before you install SQL DI.

System capacity requirements

System capacity for SQL DI varies based on several key workload factors, including the size of source data, the number of unique values, and the data type of selected columns. As the number of rows and columns increase, more CPU, memory, and storage are required for enabling and running AI queries. The number of unique column values and the size of an AI object model correspond proportionally. The more distinct column values there are, the larger the resulting object model becomes and the more system resources are needed for training the model.

Take for example the system resource usage for processing a small SQL DI AI object. The AI object is 2.2 GB in size with 26 columns and 10 million rows. While 14 columns are of the SQL DI numeric data type, the remaining 12 columns are of the categorical type. It might require up to 8 threads on 10 CPUs, up to 17 GB of memory, and 20 GB file system storage to enable the object for AI query while achieving adequate performance goals. The total of 4 million unique values contributes to the final size of the resulting model, which might require up to 13 GB of disk space in the Db2 storage group.

MVS resource workload requirements

When you enable an AI object for AI query, SQL DI creates and trains a machine learning model for the object. Model training can consume all the resources available for your OMVS subsystem. Consider defining your SQL DI workload in z/OS® Workload Manager (WLM) and assign a service class for Spark associated with this workload. For the service class, specify the default qualifier names SQLD% and SQLDAPPS with your performance goals and resource requirements for the workload. Also, consider assigning the service class for your SQL DI workload a lower priority than for your Db2 workloads. See z/OS workload management for Apache Spark for more information.