WML for z/OS uses z/OS Spark. Spark can consume a large amount of available resources on your z/OS system if it is not properly configured. It's recommended that you evaluate the performance of WML for z/OS and fine-tune your system settings accordingly.
Before you begin
Important: The performance data and recommendations contained in this document were determined in a controlled environment. Therefore, your results might vary significantly. No commitment as to your ability to obtain equivalent results is in any way intended or made by the information in this document.
Procedure
To improve the WML for z/OS performance,
apply the following settings, configurations, and practices:
-
In the SPARK_CONF_DIR/spark-env.sh specify a limit for the number of cores
that each application client can use.
For example, the following environment variable specifies that each application uses two
cores:
SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=2"
-
In the log4j.properties file in the $IML_INSTALL_DIR/configuration/generated/ directory, set logging for the Liberty Profile server to
WARN
or less, to reduce
the impact of logging on performance.
-
When you create models, use the following approaches to tune application performance:
- In JDBC calls, return just the data that you need. That is, if you need only one percent of
data, return only one percent. In the following example, the table contains six million rows.
However, only one percent of the data is used for training, and the data returned is limited to one
percent:
val jdbcDF2 = sparkSession.read.format("jdbc").options(Map(
"driver" -> "com.ibm.db2.jcc.DB2Driver",
"url" -> "jdbc:db2://host-name",
"user" -> "userID", "password" -> "password",
"dbtable" -> "(SELECT * FROM qaulifier.table-name
FETCH FIRST 60000 ROWS ONLY) as t",
)).load()
- When training the model, cache the DataFrame to reduce the number of calls to Db2®:
var trainCached = trainDF.cache()
-
A good rule of thumb is to configure your executor heaps to enable 100 percent caching and
avoid spillover to disk.
When loading data from
Db2, and storing
it in DataFrames, the in-memory size is 1 - 2 times the flat data size. To determine the final
executor heap size, use the following
calculation:
in memory data size / percentage of heap reserved for cache = executor heap
The default value for
percentage of heap reserved for cache is 0.54.
For example, if you have 1.5 GB of data to load, you can use the following calculation to find the
final executor size:
3 GB (1.5 GB * 2) / 0.54 (default) = 5.55 GB