Workload management best practices
Find hints and tips for tuning workload management for analysis jobs to avoid common issues.
Common issues with workload management
- Running column analysis on a large number of tables in parallel fails.
- Running column analysis on data assets with more than a million rows fails.
- Parallel DataStage jobs fail due to resource not being available.
- Java out-of-memory and core dump issues occur during column analysis.
- Column analysis jobs that are run on data assets with a large number of columns hang.
- Information Analyzer client timeouts can occur when column analysis is run on data assets with a large number of columns.
- Projects take a long time to load.
Tuning tips
- Modify the job count, CPU usage, memory usage, and job start settings in your workload management system policies. See Workload Management Server: Overview and best practices for detailed tuning information.
- Tune parameters defined in the UVCONFIG file to support running and queuing
a large number of jobs. See the topics Tuning the InfoSphere Information Server engine for large numbers of
users or jobs (Windows Server) and Using tunable parameters in the UVCONFIG file in the IBM
InfoSphere Information Server documentation. Modify the settings of the following parameters:
- MFILES
- T30FILE (Find more tuning information for this parameter in the technote Tuning the UVCONFIG parameters for IBM InfoSphere Information Server Workload Management)
- RLTABSZ (Find more tuning information for this parameter in the technote Tuning the UVCONFIG parameters for IBM InfoSphere Information Server Workload Management)
- MAXRLOCK
- DMEMOFF
- PMEMOFF
- NMEMOFF
- Follow the instructions in the technote How to solve java.lang.OutOfMemoryError, failed to create a thread, when IBM InfoSphere DataStage and QualityStage Operations Console and Workload Management Server are enabled to avoid Java out-of-memory errors.
- Follow the instructions in the technote InfoSphere DataStage DB2 Connector job fails with an error: SQL1476N The current transaction was
rolled back because of error "-911" to avoid failure of Db2 Connector jobs on high workloads
when running column analysis. Turn on the auto commit mode and tune the array size accordingly at
the project level.
For example, for an analysis run on 98 columns and 162 M rows, setting the array size to 20000 with auto commit mode enabled should yield reasonable performance.
- Increase the size of the Db2 transaction log for the analysis database (IADB). Modify the
settings of the following parameters:
- LOGFILSIZ
- LOGPRIMARY
- LOGSECOND
- NEWLOGPATH
For more information about tuning the transaction log size, see DB2 transaction log size.
In addition, check this technote: Space requirements for IBM Information Analyzer IADB repository and database cleanup guidelines
- In a Linux installation, change the default nproc parameter setting in the
/etc/security/limits.d/90-nproc.conf file. This parameter setting limits the
number of system processes. The default setting is 1024.
Work with your system administrator to change the setting to
unlimited
or to the maximum value that your environment supports. - Increase the value of the maxuproc kernel parameter setting if required. This parameter controls the maximum number of processes per user for a node. The default setting is 1000. For more information, see Setting the maximum number of processes for parallel jobs (AIX) in the IBM InfoSphere Information Server documentation.
- Check the network latency. Measure the latency between the Information Analyzer Engine and the
source database, and also between the Information Analyzer Engine and the IADB repository. The
latency with the IADB repository should always be very low because this database should be located
in the same network as the Information Analyzer Engine (usually the same machine that contains the
meta repository).If the latency with the source database is high, you should consider using data
sampling or relocating a copy of the source database to a database in the local network.
For more information about network latency, see this technote: How does a high network latency can impact DataStage's connectivity with Databases
- Use data sampling. If you do not need to analyze the complete list of values of a column, data sampling is a great alternative. This allows you to generate results based on a fraction of your data. Times to analyze a data sample are significantly shorter.
- Break down the analysis by columns. Select a small group of columns (for example, 8 columns) and start the analysis. After this analysis has finished and you have a better idea of how long it takes to analyze this number of columns, you can create additional groups of columns and schedule them so they do not run all at the same time. Some advantages of this approach are: it provides you with a better picture of the progress; it allows you to see the results of your table as they become available so you do not have to wait for the entire table to be analyzed; and if a problem occurs it will be easier to isolate it because of the smaller process scope.
- Check the performance of the databases involved outside of Information Analyzer. Column analysis is a process that runs queries against a source database to extract data and also against the Information Analyzer (IADB) repository to insert data. Check with your database administrator that these databases are not being overloaded with other tasks or applications when you are running a column analysis.
- Increase the swap space if required.
- Increase the Java heap size for WebSphere®
Application Server:
- For WebSphere Application Server ND, check the information in this technote: How to increase JVM heap size in WAS via server.xml
- For WebSphere Application Server Liberty, check the information in this topic: Setting the Java heap size on the JVM for WebSphere Application Server or WebSphere Application Server Liberty
- Increase the timeout settings for the Information Analyzer Console client as described in this technote: IBM InfoSphere Information Analyzer Console client hangs or is unresponsive
- Increase the JVM heap size for the Information Analyzer client if data quality projects take
very long to load:
- Edit the /ASBNode/conf/proxy.xml. This file is located in the /opt/IBM/InformationServer/ directory in the conductor pod.
- Change the value specified for MaximumHeapSize in the application-specific
section:
<isc.JvmSettings> <add key="MaximumHeapSize" value="128" /> </isc.JvmSettings>
This is the default setting. Set the MaximumHeapSize value to 1024.
- Reduce the ODF and ODFEngine startup times. ODF startup can take quite some time when a large
number of Kafka messages must be processed that are generated when multiple analysis requests are
run in parallel. By default, all messages from the past 7 days are processed by ODF. You can reduce
the time frame so that only messages from the last few hours (e. g., 6 hours) are processed:
- In the iis-services pod, set the corresponding iisAdmin property as
follows:
/opt/IBM/InformationServer/ASBServer/bin/iisAdmin.sh -s -k com.ibm.iis.odf.kafka.skipmessages.older.than.secs -value 21600
- Complete the following steps in the conductor pod:
- Edit the /opt/IBM/InformationServer/ASBNode/conf/odf.properties file and
add the following
line:
com.ibm.iis.odf.kafka.skipmessages.older.than.secs=21600
- Restart the ODF
engine:
service ODFEngine stop service ODFEngine start
- Edit the /opt/IBM/InformationServer/ASBNode/conf/odf.properties file and
add the following
line:
- In the iis-services pod, set the corresponding iisAdmin property as
follows: