Customizing worker behavior

Workers are containerized applications that run workflows. Each worker belongs to a worker group. When you run a workflow, you can select which worker group runs it.

The default worker group contains local workers, that is, workers that are preinstalled in the Concert Workflows namespace. By default, this group is used to run workflows.

Alternatively, you can run workflows by using remote workers, that is, workers that you install on remote hosts. You create a worker group, install remote workers, and attach them to the new group. In the workflow editor, you can then select the new group to run the workflow.

You can customize how local and remote workers behave when they run workflows. For example, you can make the following customizations:

Route worker traffic through proxy servers.
Configure remote workers to scale the worker pods that they need up or down based on workload.
Configure a workflow allowlist to control which workflows workers can run.
Specify how long workers can pause workflows for before terminating the workflows.
Prevent workers from pausing workflows.
Adjust the validity periods of workers’ user tokens.

Routing worker traffic through proxy servers

Proxy servers provide a way to monitor and control network traffic, enabling you to enforce security policies, monitor network activity for regulatory compliance, and protect internal networks from external threats. You can route network traffic for Concert Workflows workers through proxy servers.

Network traffic for workers falls into two categories:

Traffic from local and remote workers to the internet, for example, requests that are sent to external services via integration blocks when a workflow is run.
Traffic from remote workers to the Concert Workflows host.

You can route each traffic type through its own proxy server. You can route the worker-to-internet traffic through one proxy server, and route the remote worker-to-Concert Workflows traffic through another.

If you route a traffic type through a proxy server, the proxy settings that you configure will apply only to that traffic type. For example, if you route the worker-to-internet traffic through a proxy, the proxy settings will apply only to the worker-to-internet traffic. They will not apply to traffic from remote workers to your Concert Workflows host.

To route the worker-to-internet traffic, you must provide the connection settings for a proxy server in the Concert Workflows UI. You can configure proxy settings that apply at a global level or at a worker group level.

To configure global proxy settings, in the Concert Workflows sidebar, click System configuration, then click the Proxy settings tab and enter the connection details for the proxy server. For more information, see Proxy settings.
To configure worker group-level proxy settings, in the Concert Workflows sidebar, click Worker groups. Edit a group, open the Proxy settings section, and enter the connection details for the proxy server. For more information, see Worker groups.

To route the remote worker-to-Concert Workflows traffic, you must provide the proxy server's uniform resource identifier (URI) in the proxy parameter value when you install the remote worker, as shown in this command:

./bin/install_rw.sh \
--license-acceptance=y \
--server-hostname=<CW_server_hostname> \
--worker-group-name=<worker_group_name> \
--worker-group-secret-key=<worker_group_secret> \
--proxy=<URI>

Replace <CW_server_hostname> with the Concert Workflows server hostname. To see this host name, open the Concert Workflows UI, click Worker groups in the sidebar, and open any group.
Replace <worker_group_name> with the name of the remote worker's worker group.
Replace <worker_group_secret> with the value of the worker group’s secret.
Replace <URI> with the URI of the proxy server, specified in the format http://[username:password@]proxy.example.com:8080 (for example, http://proxy.example.com:8080 or http://my_username:my_password123@proxy.example.com:8080).

Note: To route remote worker-to-Concert Workflows traffic, you must use a tunneling proxy server, that is, a proxy server that accepts incoming connections on a standard HTTP port and uses the HTTP CONNECT method to establish tunnels for the exchange of encrypted HTTPS traffic.

For more information about installing remote workers, see Installing a remote worker.

Automatically scaling remote worker pods

To maximize a remote worker’s efficiency, you can configure it to automatically scale the worker pods that it needs up or down based on its workload.

A worker’s workload is indicated by the amount of resources its pods use. When you enable autoscaling for a worker, the Kubernetes horizontal pod autoscaler (HPA) will monitor the worker pods’ CPU and memory usage and scale the pod replicas appropriately.

The HPA compares the pods’ CPU and memory utilization levels with predefined targets (80% utilization for CPU, 75% for memory). It then determines how many replica worker pods are needed to meet these targets and scales the number of pods up or down as required.

Note:

Autoscaling is supported only for workers that are installed on K3s, OCP, or EKS clusters where the metrics.k8s.io API group is available.
- The metrics.k8s.io group exposes the Kubernetes resource metrics API, which provides CPU and memory usage metrics for cluster pods.
- In K3s clusters, metrics.k8s.io is available by default via the metrics-server aggregator.
- In OCP clusters, metrics.k8s.io is available by default via the metrics server component in the cluster monitoring stack.
- In EKS clusters, you can make metrics.k8s.io available by installing the metrics-server aggregator as an add-on.
Autoscaling is not supported for workers that are installed via Docker Compose or Podman Compose.

You can enable autoscaling for a remote worker by including one of these parameters when you install it:

--enable-autoscaling

Enables autoscaling of worker pods based on the following default values:

Minimum number of worker pods (minReplicas): 1
Maximum number of worker pods (maxReplicas): 6
Autoscaling metric type used for scaling decisions (autoscaling-type): resource

For example, you might use this command to install a worker and enable autoscaling with default values:

./bin/install_rw.sh \
--license-acceptance=y \
--target-platform=k3s \
--server-hostname=<CW_server_hostname> \
--worker-group-name=<worker_group_name> \
--worker-group-secret-key=<worker_group_secret> \
--enable-autoscaling

--max-replicas=<max_no_of_pod_replicas>

Enables autoscaling based on the following values:

The default minimum number of worker pods (1).
The maximum number of worker pods that you specify. This value must be in the range 1-16.
The default autoscaling metric type (resource).

For example, you might use this command to install a worker and enable autoscaling up to a maximum of 10 worker pods:

./bin/install_rw.sh \
--license-acceptance=y \
--target-platform=k3s \
--server-hostname=<CW_server_hostname> \
--worker-group-name=<worker_group_name> \
--worker-group-secret-key=<worker_group_secret> \
--max-replicas=10

Important: The --max-replicas parameter value represents the maximum number of worker pods that can be created, but does not guarantee that the maximum number of pods will be started. If insufficient cluster resources are available, not all of the requested pods might start and some might remain in a Pending state.

If you upgrade a remote worker for which autoscaling is enabled, and you do not specify autoscaling values when you upgrade, the original autoscaling settings will remain unchanged. If you specify autoscaling values when you upgrade, only those settings are updated. For example, if you upgrade a worker for which autoscaling is enabled up to a maximum of 10 replicas, and you specify the value max-replicas=12, only the maxReplicas setting is updated.

Configuring a workflow allowlist

By default, after Concert Workflows is installed, workers can run any workflow. However, you might want to explicitly control which workflows workers can run. To implement this behavior, you can configure a workflow allowlist (a list that contains only the workflows that you allow workers to run).

You can configure separate allowlists for local and remote workers.

On your Concert Workflows installation host, you can configure an allowlist to control which workflows local workers can run.
On each remote host where a remote worker is installed, you can configure an allowlist to control which workflows the remote worker can run.

After you enable an allowlist for local or remote workers, those workers can run only the workflows that you specified in the allowlist. If you disable the allowlist, the workers can run any workflow.

Creating and enabling a workflow allowlist

Go to the appropriate installation folder.
- For local workers, on the Concert Workflows installation host:
  - If your instance is installed on a VM, go to the ibm-concert-std-workflows folder.
  - If your instance is installed on a Kubernetes cluster, go to the ibm-concert-k8s-workflows folder.
- For remote workers, go to the folder from where you installed the remote worker (ibm-concert-k8s-workflows, ibm-concert-std-workflows, or ibm-concert-workflows-rw).

Copy the sample allowlist file to a file that is called flow-allowlist.json.

For local workers, run this command:

cp charts/rna-core/files/flow-allowlist-sample.json \
   charts/rna-core/files/flow-allowlist.json

For remote workers, run this command:

cp charts/rna-remote-worker/files/flow-allowlist-sample.json \
   charts/rna-remote-worker/files/flow-allowlist.json

Create the allowlist.
- In the flow-allowlist.json file, define a JSON array of workflow entries, one entry for each workflow that workers are allowed to run.
- Each entry must contain these two key-value pairs:
  - “flow”: “<CW workflow path>”
  - “hash”: “<CW workflow unique identifier>”
    - Replace <CW workflow path> with the folder path of the workflow, for example: dave/User/my_test_workflow
    - Replace <CW workflow unique identifier> with the unique base64-encoded identifier of the workflow.
    - To find the workflow identifier, in the Concert Workflows UI, go to the Workflows page, find the workflow, click the About workflow option in the three-dot menu, and copy the value that is shown in the Hash value field (for example, 6AA2jYRllyqjzIfi8ShjR8KV460zvdvcOdNfBAUDYps=).
- At least one entry must contain this key-value pair: "main": true
- For example, this allowlist specifies that workers can run only the alices_flow and bobs_flow workflows:
```
[{
    "flow": "alice/User/alices_flow",
    "hash": "SGVsbG8gV29ybGQ=",
    "main": true
},{
    "flow": "bob/User/bobs_flow”,
    "hash": "eyJuYW1lIjoiSm9obiIsICJhZ2UiOjMwfQ=="
}]
 
```
Note:
- If the file contains one or more workflow entries, only those workflows can be run.
- If the file is empty, all workflows can be run.
Stop all running workflows in your Concert Workflows instance.
Enable the allowlist.
1. For local workers, run this command:
```
./bin/adm/cw-allowlist-configure.sh apply
```
2. For remote workers that run in a K3s or Kubernetes cluster, run this command:
```
./bin/adm/cw-allowlist-configure.sh apply
```
3. For remote workers that use Docker or Podman as their container runtime, reinstall the worker by running this command:
```
./bin/install_rw.sh \
--license-acceptance=y \
--server-hostname=<CW_server_hostname> \
--worker-group-name=<worker_group_name> \
--worker-group-secret-key=<worker_group_secret> \
--target-platform=<podman|docker>
```
  - Replace <CW_server_hostname> with the Concert Workflows server hostname. To see this host name, open the Concert Workflows UI, click Worker groups in the sidebar, and open any group.
  - Replace <worker_group_name> with the name of the remote worker's worker group.
  - Replace <worker_group_secret> with the value of the worker group’s secret.
  - If your container runtime is Docker, specify docker as the target-platform parameter value. If you are using Podman, specify podman.

Disabling a workflow allowlist

Go to the appropriate installation folder.
- For local workers, on the Concert Workflows installation host:
  - If your instance is installed on a VM, go to the ibm-concert-std-workflows folder.
  - If your instance is installed on a Kubernetes cluster, go to the ibm-concert-k8s-workflows folder.
- For remote workers, go to the folder from where you installed the remote worker (ibm-concert-k8s-workflows, ibm-concert-std-workflows, or ibm-concert-workflows-rw).
Stop all running workflows in your Concert Workflows instance.
Disable the allowlist.
1. For local workers, run this command:
```
./bin/adm/cw-allowlist-configure.sh disable
```
2. For remote workers that run in a K3s or Kubernetes cluster, run this command:
```
./bin/adm/cw-allowlist-configure.sh disable
```
3. For remote workers that use Docker or Podman as their container runtime, reinstall the worker by running this command:
```
./bin/install_rw.sh \
--license-acceptance=y \
--server-hostname=<CW_server_hostname> \
--worker-group-name=<worker_group_name> \
--worker-group-secret-key=<worker_group_secret> \
--target-platform=<podman|docker>
```
  - Replace <CW_server_hostname> with the Concert Workflows server hostname. To see this host name, open the Concert Workflows UI, click Worker groups in the sidebar, and open any group.
  - Replace <worker_group_name> with the name of the remote worker's worker group.
  - Replace <worker_group_secret> with the value of the worker group’s secret.
  - If your container runtime is Docker, specify docker as the target-platform parameter value. If you are using Podman, specify podman.

Configuring workflow pausing behavior

Occasionally, local workers are automatically terminated. This action might occur if, for example, a rolling restart is triggered when new integrations are made available. By default, before a worker is terminated, it pauses the workflows that it is running and saves the workflow state to the database. The execution of the workflows can then be resumed on another worker.

To prevent workflows from being paused indefinitely, a default timeout period of 1800 seconds (30 minutes) is applied. If a workflow is still paused when this timeout is reached, the workflow is terminated. You can customize this timeout period for local and remote workers.

To customize the timeout period for local workers, before you install or upgrade Concert Workflows, edit the params.ini file and set the WORKFLOWS_TERMINATION_TIMEOUT parameter to your preferred period in seconds. For more information, see Configuring the params.ini file.
To customize the timeout period for remote workers, when you run the remote worker installation script (install_rw.sh), set the termination-timeout parameter to your preferred period in seconds, for example:
```
./bin/install_rw.sh \
--license-acceptance=y \
--server-hostname=<CW_server_hostname> \
--worker-group-name=<worker_group_name> \
--worker-group-secret-key=<worker_group_secret> \
--target-platform=k3s \
--termination-timeout=3600
```
For more information about installing remote workers, see Installing a remote worker as a root user.

You can also prevent workers from pausing workflows. Taking this action ensures that before a worker is terminated, it completes the workflows that it is running and does not save workflow state information to the database.

To prevent local workers from pausing workflows, before you install or upgrade Concert Workflows, edit the params.ini file and set the WORKFLOWS_FLOW_PAUSE_ENABLED parameter to FALSE. For more information, see Configuring the params.ini file.
To prevent a remote worker from pausing workflows, when you run the remote worker installation script (install_rw.sh), include the disable-flow-pausing parameter, for example:
```
./bin/install_rw.sh \
--license-acceptance=y \
--server-hostname=<CW_server_hostname> \
--worker-group-name=<worker_group_name> \
--worker-group-secret-key=<worker_group_secret> \
--target-platform=k3s \
--disable-flow-pausing
```
For more information about installing remote workers, see Installing a remote worker as a root user.

Adjusting validity periods for worker user tokens

When a worker runs a workflow, it generates a user-scoped token to impersonate the user that triggered the workflow. The worker uses this token to fetch the workflow definition and retrieve any authentications that it needs to connect to external services when running the workflow.

The default period for which the token is valid is 3600 seconds (1 hour). You might want to reduce this period to minimize token exposure, or increase it to enable the completion of long-running workflows.

For local workers, you can adjust this period when you install or upgrade Concert Workflows. Edit the params.ini file and update the value of the WORKFLOWS_USER_WORKER_TOKEN_DURATION parameter to your preferred duration in seconds. After you complete the installation or upgrade, the updated validity period will be applied for local worker tokens.