Resource thresholds

6.4 and later

A StreamSets environment sets thresholds for the following Data Collector engine resources:
  • CPU load
  • Memory
  • Number of running jobs

Watsonx.data integration monitors these thresholds with Data Collector engine versions 6.4 and later. The thresholds are ignored with earlier engine versions.

Jobs start on an engine when the engine is online and has not exceeded any of these thresholds. If an engine exceeds a threshold, new jobs fail to start on the engine.

If multiple engine instances are running for the same environment, jobs can start on any engine instance that is online and within resource thresholds. When more than one engine instance is available, the job is randomly assigned to one of the instances.

CPU load

The CPU load threshold for an engine is calculated based on the available CPU in the container.

For example, when you configure the environment, you set the maximum CPU load to 80% and the VPCs allocated to the engine to 12. When the engine uses less than 80% of 12 CPU cores, or 9.6 CPU cores, new jobs can start on the engine. When the engine uses 9.6 or more CPU cores, new jobs do not start on the engine.

An engine can exceed the threshold as it runs jobs. For example with the preceding configuration, when the engine uses 9 CPU cores, a new job starts on the engine so that the engine uses a total of 11 CPU cores. All running jobs continue, but no new jobs are started on the engine.

Memory

The memory threshold for an engine is calculated based on the configured Java heap size for the container.

You configure both the memory threshold and the Java heap size in the Configure details section when you configure the environment:
  • To configure the memory threshold, define a percentage in the Max memory used property.

  • To configure the Java heap size, view the Advanced configuration options. In the JVM options section, add JVM options to configure the Java heap size as a percentage of available memory as follows:
    • -XX:InitialRAMPercentage=75
    • -XX:MaxRAMPercentage=75
For example, you configure the environment properties as follows:
  • Max memory used = 80%
  • Maximum Java heap size = 50%

The engine container has 4 GB of memory. Because the Java heap size is set to 50%, the engine can use a maximum of 2 GB of memory. When the engine uses less than 80% of 2 GB, or 1.6 GB of memory, new jobs can start on the engine. When the engine uses 1.6 GB or more of memory, new jobs do not start on the engine.

An engine can exceed the threshold as it runs jobs. For example with the preceding configurations, when the engine uses 1.5 GB of memory, a new job starts on the engine so that the engine uses a total of 2 GB of memory. All running jobs continue, but no new jobs can start on the engine.

Running jobs

The running jobs threshold for an engine determines the maximum number of jobs that the engine can run at the same time.

For example, you set the maximum running jobs to 10. When the engine is running 9 or fewer jobs, new jobs can start on the engine. When the engine is running 10 jobs, new jobs cannot start on the engine.

Defining resource thresholds

You can define the resource thresholds for an engine when you edit the StreamSets environment.

Procedure

  1. On the Manage tab of your project, click the StreamSets tool.
  2. For the environment, click Options > Edit environment.
  3. In the Configure details section, modify the following thresholds:
    Resource threshold Description
    Max CPU load

    Maximum percentage of CPU in the container that an engine can use. When an engine equals or exceeds this threshold, new jobs do not start on the engine.

    Default is 80.

    Max memory used

    Maximum percentage of the configured Java heap size that an engine can use. When an engine equals or exceeds this threshold, new jobs do not start on the engine.

    Default is 100.

    Max jobs running

    Maximum number of jobs that can run on an engine at the same time. When an engine equals this threshold, new jobs do not start on the engine.

    Default is 10.

  4. Optionally, modify the default Java heap size.
    1. View the Advanced configuration options.
    2. In the JVM options section, enter the following values:
      • -XX:InitialRAMPercentage=<percentage>
      • -XX:MaxRAMPercentage=<percentage>
    3. Click Save.
  5. Click Save.