Setting up before-job and after-job subroutines in DataStage

You use before-job and after-job subroutines to run built-in subroutines.

Before-job and after-job subroutines include running a script before the job runs or generating a report after the job successfully completes. A return code of 0 from the subroutine indicates success. Any other code indicates failure and causes an unrecoverable error when the job is run.

To set up a before-job or after-job subroutine, complete the following steps.
  1. Open a DataStage® flow, then click the Settings Settings icon icon.
  2. On the Settings page, click Before/after-job subroutines.
  3. Specify a before-job subroutine, an after-job subroutine, or both. Then, click Save.

Using custom Python code in subroutines

You can install Python packages to run scripts in before-job and after-job subroutines.
  1. Open a DataStage flow, navigate to Settings Settings icon, and click Before/after-job subroutines.
  2. Under Built-in subroutine to execute, choose Execute shell command.
  3. In the Shell command text box, enter a command to create a directory for your modules under /px-storage and a command to install the desired modules. This example command installs modules in the directory pip_modules.
    mkdir -p /px-storage/pip_modules && pip3 install <modulename1> <modulename2> --target /px-storage/pip_modules –user
  4. Save and execute the flow.
  5. To enable non-root users to execute your script, append the file path of the module directory at the top of your Python script. Following the previous example:
    import sys
    sys.path.append("/px-storage/pip_modules")
  6. Replace the command in the Shell command text box with a command to call the Python script with its filepath. This example calls the script test_data.py from /ds-storage/ing_py_env.
    python3 /ds-storage/ing_py_env/test_data.py
  7. Save and execute the flow.

Using cpdctl in before and after job routines

You can use the command line (cpdctl) in before and after your job routines.

You can find cpdctl binary in the /px-storage/tools/cpdctl directory. If you want to update the cpdctl to the latest version, you can download the specific version from this page: https://github.com/IBM/cpdctl/releases. Use the following command to copy the cpdctl version:
 oc cp cpdctl ds-px-default-ibm-datastage-px-runtime-7d77747cfc-sjngt:/px-storage/tools/cpdctl/cpdctl

To execute cpdctl commands, complete the following steps.

  1. Open a DataStage flow, go to Settings Settings icon, and click Before/after-job subroutines.
  2. Under Built-in subroutine to execute, choose Execute shell command.
  3. In the Shell command text box, enter a cpdctl command that you want to run, for example
    cpdctl project list
  4. If you want to run the flow from the canvas, you can create a local parameter on your job canvas.
    1. On the canvas, click Add parameters, and then Create parameter.
    2. In the Name field, specify the parameter name as $ENABLE_CPDCTL.
    3. Choose the String type and enter a value 1 in the Default value field.
  5. If you want to use the cpdctl command line to run your job, use the following command to configure the job with the variable ENABLE_CPDCTL.
    cpdctl dsjob run --project-name --job --env ENABLE_CPDCTL=1
  6. Save and run the job with the specified environment option ENABLE_CPDCTL=1.