Workload management best practices

Find hints and tips for tuning workload management for analysis jobs to avoid common issues.

Common issues with workload management

The following list shows some common workload management issues that you might encounter when running analysis jobs:
  • Running column analysis on a large number of tables in parallel fails.
  • Running column analysis on data assets with more than a million rows fails.
  • Parallel DataStage jobs fail due to resource not being available.
  • Java out-of-memory and core dump issues occur during column analysis.
  • Column analysis jobs that are run on data assets with a large number of columns hang.
  • Information Analyzer client timeouts can occur when column analysis is run on data assets with a large number of columns.
  • Projects take a long time to load.

Tuning tips

The following tips and best practices can help you avoid or fix scalability issues and can help with performance.
  • Modify the job count, CPU usage, memory usage, and job start settings in your workload management system policies. See Workload Management Server: Overview and best practices for detailed tuning information.
  • Tune parameters defined in the UVCONFIG file to support running and queuing a large number of jobs. See the topics Tuning the InfoSphere Information Server engine for large numbers of users or jobs (Windows Server) and Using tunable parameters in the UVCONFIG file in the IBM InfoSphere Information Server documentation. Modify the settings of the following parameters:
  • Follow the instructions in the technote How to solve java.lang.OutOfMemoryError, failed to create a thread, when IBM InfoSphere DataStage and QualityStage Operations Console and Workload Management Server are enabled to avoid Java out-of-memory errors.
  • Follow the instructions in the technote InfoSphere DataStage DB2 Connector job fails with an error: SQL1476N The current transaction was rolled back because of error "-911" to avoid failure of Db2 Connector jobs on high workloads when running column analysis. Turn on the auto commit mode and tune the array size accordingly at the project level.

    For example, for an analysis run on 98 columns and 162 M rows, setting the array size to 20000 with auto commit mode enabled should yield reasonable performance.

  • Increase the size of the Db2 transaction log for the analysis database (IADB). Modify the settings of the following parameters:
    • LOGFILSIZ
    • LOGPRIMARY
    • LOGSECOND
    • NEWLOGPATH

    For more information about tuning the transaction log size, see DB2 transaction log size.

    In addition, check this technote: Space requirements for IBM Information Analyzer IADB repository and database cleanup guidelines

  • In a Linux installation, change the default nproc parameter setting in the /etc/security/limits.d/90-nproc.conf file. This parameter setting limits the number of system processes. The default setting is 1024.

    Work with your system administrator to change the setting to unlimited or to the maximum value that your environment supports.

  • Increase the value of the maxuproc kernel parameter setting if required. This parameter controls the maximum number of processes per user for a node. The default setting is 1000. For more information, see Setting the maximum number of processes for parallel jobs (AIX) in the IBM InfoSphere Information Server documentation.
  • Check the network latency. Measure the latency between the Information Analyzer Engine and the source database, and also between the Information Analyzer Engine and the IADB repository. The latency with the IADB repository should always be very low because this database should be located in the same network as the Information Analyzer Engine (usually the same machine that contains the meta repository).If the latency with the source database is high, you should consider using data sampling or relocating a copy of the source database to a database in the local network.

    For more information about network latency, see this technote: How does a high network latency can impact DataStage's connectivity with Databases

  • Use data sampling. If you do not need to analyze the complete list of values of a column, data sampling is a great alternative. This allows you to generate results based on a fraction of your data. Times to analyze a data sample are significantly shorter.
  • Break down the analysis by columns. Select a small group of columns (for example, 8 columns) and start the analysis. After this analysis has finished and you have a better idea of how long it takes to analyze this number of columns, you can create additional groups of columns and schedule them so they do not run all at the same time. Some advantages of this approach are: it provides you with a better picture of the progress; it allows you to see the results of your table as they become available so you do not have to wait for the entire table to be analyzed; and if a problem occurs it will be easier to isolate it because of the smaller process scope.
  • Check the performance of the databases involved outside of Information Analyzer. Column analysis is a process that runs queries against a source database to extract data and also against the Information Analyzer (IADB) repository to insert data. Check with your database administrator that these databases are not being overloaded with other tasks or applications when you are running a column analysis.
  • Increase the swap space if required.
  • Increase the Java heap size for WebSphere® Application Server:
  • Increase the timeout settings for the Information Analyzer Console client as described in this technote: IBM InfoSphere Information Analyzer Console client hangs or is unresponsive
  • Increase the JVM heap size for the Information Analyzer client if data quality projects take very long to load:
    1. Edit the /ASBNode/conf/proxy.xml. This file is located in the /opt/IBM/InformationServer/ directory in the conductor pod.
    2. Change the value specified for MaximumHeapSize in the application-specific section:
      <isc.JvmSettings>
      <add key="MaximumHeapSize" value="128" />
      </isc.JvmSettings>

      This is the default setting. Set the MaximumHeapSize value to 1024.

  • Reduce the ODF and ODFEngine startup times. ODF startup can take quite some time when a large number of Kafka messages must be processed that are generated when multiple analysis requests are run in parallel. By default, all messages from the past 7 days are processed by ODF. You can reduce the time frame so that only messages from the last few hours (e. g., 6 hours) are processed:
    1. In the iis-services pod, set the corresponding iisAdmin property as follows:
      /opt/IBM/InformationServer/ASBServer/bin/iisAdmin.sh -s -k com.ibm.iis.odf.kafka.skipmessages.older.than.secs -value 21600
    2. Complete the following steps in the conductor pod:
      1. Edit the /opt/IBM/InformationServer/ASBNode/conf/odf.properties file and add the following line:
        com.ibm.iis.odf.kafka.skipmessages.older.than.secs=21600
      2. Restart the ODF engine:
        service ODFEngine stop
        service ODFEngine start