Configuring workload management policies for DataStage

You can configure workload management policies for DataStage by editing the configuration file or in your project by editing your DataStage instance settings.

Policies

You can set threshold values for these system policies:
Job Count
Specify the maximum number of concurrent running jobs that are allowed on the system. The default value is 20 concurrent running jobs. Most suitable value for each runtime instance depends on the available resources, partitions number, and your SLA.
Note: Job count excludes jobs that run in a pipeline.
Queue-specific Job Count
Specify the maximum number of concurrent running jobs for each priority (low, medium, and high).
Job Start
Specify the maximum number of jobs that are allowed to start in the specified number of seconds. The default value is 100 jobs in 10 seconds.
CPU Usage
Specify the maximum CPU usage that is allowed on the system. If the current CPU usage exceeds this value, the job is not allowed to start. The default value is 80 percent CPU usage. If you have many small jobs, you may consider higher value to avoid jobs being queued.
Memory Usage
Specify the maximum memory usage that is allowed on the system. If the memory usage exceeds this value, the job is not allowed to start. The default value is 80 percent memory usage. If you have many small jobs, you may consider higher value to avoid jobs being queued.

Editing the configuration file

To configure the policies by editing the configuration file wlm.config.xml, complete the following steps.

  1. Log in to your Red Hat® OpenShift® cluster as an instance administrator:
    oc login -u kubeadmin -p xxxxx-xxxx-xxxx
  2. Find the runtime instance that you want to update. For example, to update the default instance:
    oc get pods | grep ds-px-default-ibm-datastage-px-runtime
  3. Open a shell on the runtime pod:
    oc rsh ds-px-default-ibm-datastage-px-runtime-7b5b7975b8-jnzbg
  4. Change directory to the workload management home directory, for example:
    sh-4.4$ cd /opt/ibm/PXService/Server/DSWLM
  5. Stop workload management:
    sh-4.4$ ./stopwlm.sh
  6. Edit the configuration file:
    bash-4.4$ nano /px-storage/config/wlm/wlm.config.xml
  7. Add the following information to the configuration file according to the policies that you want to set.
    Job Count and Job Start
    Note: Job count excludes jobs that run in a pipeline.
      <!-- Declare a list of computing resources -->
      <Resources>
        <Resource name="JobCount" value="5" />
        <Resource name="StartJob" value="5" timeFrame="4" />
      </Resources>
    CPU Usage and Memory Usage
    <!-- parameter to indicate CPU cap (in percent) -->
        <Parameter name="CPUCap" value="95" />
        <!-- parameter to indicate memory cap (in percent) -->
        <Parameter name="MemoryCap" value="80" />
    Maximum number of queued jobs
    The default value of MaxQueuedJobs parameter is 1000. If the value is set to 0, the MaxQueuedJobs check is skipped.
    <!-- parameter to configure maximum number of queued jobs -->
    <Parameter name="MaxQueuedJobs" value="1000" />
  8. Start workload management:
    sh-4.4$ nohup ./startwlm.sh &
    [1] 16034
    sh-4.4$ nohup: ignoring input and appending output to 'nohup.out'
  9. Confirm that workload management is running:
    sh-4.4$ ps -ef | grep WLM | grep -v grep
    1000630+   16034   15905  3 15:02 pts/0    00:00:00 ./../../jdk/bin/java -Xmx2048m -classpath ./dist/lib/commons-lang-2.6.jar:./dist/lib/commons-codec-1.15.jar:./../../ASBNode/lib/java/jsr311-api-1.1.1.jar:./../../ASBNode/lib/java/slf4j-api-1.6.1.jar:./../../ASBNode/lib/java/wink-1.2.1-incubating.jar:./../../ASBNode/lib/java/wink-client-1.2.1-incubating.jar:./../../ASBNode/lib/java/wink-common-1.2.1-incubating.jar:./../../ASBNode/lib/java/wink-server-1.2.1-incubating.jar:./dist/lib/wlm.jar:./dist/lib/wlmstart.jar com.ibm.iis.common.wlm.service.server.ds.DSWLMServer
    sh-4.4$ 

Metrics

You can set up the READ_COMPUTE_METRICS_FROM to affect how workload management reads metrics. You can configure metrics read and retry behavior by editing the px-runtime deployment file.

  private static final String _readMetricsFrom = System.getenv("READ_COMPUTE_METRICS_FROM");
  // Set this to "socket" to use socket calls to compute pod and get the metrics                                                                                                                       
  // By default the metrics are read from disk file                                                                                                                                                    
  // Disk metrics reads that fail to be read or are stale will be retried                                                                                                                              
  // with socket call unless "no_retry" is set                                                                                                                                                         
  // Disk metrics reads can also be retried from disk by setting "disk_retry"                                                                                                                          
  static
  {
    _readMetricsFromSocket = "socket".equalsIgnoreCase(_readMetricsFrom);
    _retryMetricsFromDisk = "disk_retry".equalsIgnoreCase(_readMetricsFrom);
    _retryMetricsNone = "no_retry".equalsIgnoreCase(_readMetricsFrom);
  }