Configuring workload management policies for DataStage
You can configure workload management policies for DataStage by editing the configuration file or in your project by editing your DataStage instance settings.
Policies
You can set threshold values for these system policies:
- Job Count
- Specify the maximum number of concurrent running jobs that are allowed on the system. The
default value is 20 concurrent running jobs. Most suitable value for each runtime instance depends
on the available resources, partitions number, and your SLA.Note: Job count excludes jobs that run in a pipeline.
- Queue-specific Job Count
- Specify the maximum number of concurrent running jobs for each priority (low, medium, and high).
- Job Start
- Specify the maximum number of jobs that are allowed to start in the specified number of seconds. The default value is 100 jobs in 10 seconds.
- CPU Usage
- Specify the maximum CPU usage that is allowed on the system. If the current CPU usage exceeds this value, the job is not allowed to start. The default value is 80 percent CPU usage. If you have many small jobs, you may consider higher value to avoid jobs being queued.
- Memory Usage
- Specify the maximum memory usage that is allowed on the system. If the memory usage exceeds this value, the job is not allowed to start. The default value is 80 percent memory usage. If you have many small jobs, you may consider higher value to avoid jobs being queued.
Editing the configuration file
To configure the policies by editing the configuration file wlm.config.xml, complete the following steps.
- Log in to your Red Hat®
OpenShift® cluster as an
instance administrator:
oc login -u kubeadmin -p xxxxx-xxxx-xxxx - Find the runtime instance that you want to update. For example, to update the default
instance:
oc get pods | grep ds-px-default-ibm-datastage-px-runtime - Open a shell on the runtime
pod:
oc rsh ds-px-default-ibm-datastage-px-runtime-7b5b7975b8-jnzbg - Change directory to the workload management home directory, for
example:
sh-4.4$ cd /opt/ibm/PXService/Server/DSWLM - Stop workload management:
sh-4.4$ ./stopwlm.sh - Edit the configuration
file:
bash-4.4$ nano /px-storage/config/wlm/wlm.config.xml - Add the following information to the configuration file according to the policies that you want
to set.
- Job Count and Job Start
-
Note: Job count excludes jobs that run in a pipeline.
<!-- Declare a list of computing resources --> <Resources> <Resource name="JobCount" value="5" /> <Resource name="StartJob" value="5" timeFrame="4" /> </Resources>
- CPU Usage and Memory Usage
-
<!-- parameter to indicate CPU cap (in percent) --> <Parameter name="CPUCap" value="95" /> <!-- parameter to indicate memory cap (in percent) --> <Parameter name="MemoryCap" value="80" />
- Maximum number of queued jobs
-
The default value of
MaxQueuedJobsparameter is1000. If the value is set to0, theMaxQueuedJobscheck is skipped.<!-- parameter to configure maximum number of queued jobs --> <Parameter name="MaxQueuedJobs" value="1000" />
- Start workload management:
sh-4.4$ nohup ./startwlm.sh & [1] 16034 sh-4.4$ nohup: ignoring input and appending output to 'nohup.out' - Confirm that workload management is
running:
sh-4.4$ ps -ef | grep WLM | grep -v grep 1000630+ 16034 15905 3 15:02 pts/0 00:00:00 ./../../jdk/bin/java -Xmx2048m -classpath ./dist/lib/commons-lang-2.6.jar:./dist/lib/commons-codec-1.15.jar:./../../ASBNode/lib/java/jsr311-api-1.1.1.jar:./../../ASBNode/lib/java/slf4j-api-1.6.1.jar:./../../ASBNode/lib/java/wink-1.2.1-incubating.jar:./../../ASBNode/lib/java/wink-client-1.2.1-incubating.jar:./../../ASBNode/lib/java/wink-common-1.2.1-incubating.jar:./../../ASBNode/lib/java/wink-server-1.2.1-incubating.jar:./dist/lib/wlm.jar:./dist/lib/wlmstart.jar com.ibm.iis.common.wlm.service.server.ds.DSWLMServer sh-4.4$
Metrics
You can set up the READ_COMPUTE_METRICS_FROM to affect how workload management reads metrics. You can configure metrics read and retry behavior by editing the px-runtime deployment file.
private static final String _readMetricsFrom = System.getenv("READ_COMPUTE_METRICS_FROM");
// Set this to "socket" to use socket calls to compute pod and get the metrics
// By default the metrics are read from disk file
// Disk metrics reads that fail to be read or are stale will be retried
// with socket call unless "no_retry" is set
// Disk metrics reads can also be retried from disk by setting "disk_retry"
static
{
_readMetricsFromSocket = "socket".equalsIgnoreCase(_readMetricsFrom);
_retryMetricsFromDisk = "disk_retry".equalsIgnoreCase(_readMetricsFrom);
_retryMetricsNone = "no_retry".equalsIgnoreCase(_readMetricsFrom);
}