DataStage jobs in pipeline flows example

With Orchestration Pipelines, you can create a pipeline to run and connect DataStage® jobs.

Example

In this example, you can find a step-by-step procedure of using DataStage flows to copy the data from the IBM Netezza Performance Server for DataStage to the Oracle Database for DataStage table. Create a pipeline with two DataStage jobs and one bash script to get the maximum value from the database. Get it from one flow, store it in a text file, and then use stored value in another flow that chooses the most current maximum value.

Creating a storage volume connection
  1. Get familiar with the Sharing storage volumes between Orchestration Pipelines and DataStage.
  2. Open IBM Cloud Pack for Data. Click Navigation menu>Administration>Storage volumes and create a new storage volume.
  3. In Storage volume overview section choose Namespace and type Volume name.
  4. In Storage volume details section choose Volume type, Storage class and type a path in Mount path.
  5. Go to your storage volume details. In Configuration details find PVC to use and copy it.
  6. Go to your cluster. Add copied information to your code, for example:
    
    spec:
      additional_storage:
      - mount_path: /mnts/tutorial
      pvc_name: volumes-tutorial-sample-pvc
Creating a three-node pipeline flow by using DataStage flows, available connectors and bash script
Figure 1. Three-node pipeline flow
Three-node pipeline flow
  1. Create or Open a project.
  2. Create the first node, which takes data from IBM Netezza Performance Server for DataStage and send it to the sequential file.
    • Click Run node ribbon and drag a Run DataStage job on your canvas.
    • In Input find an icon next to DataStage job and click it to select the resource of DataStage job. Click Save.
    • For this example, in MAX_NUMBER type 50000. In the filename params.filename is an environment variable.
  3. Create a second node.
    • In the Run node ribbon find a Run Bash script and drag it on your canvas.
    • For this example, in Script code type cat $e1_filename, which reads a file sequentially and prints it to the standard output. Pass the filename parameter (e1_filename) to bash node and add it as an environmental variable (params.filename).
  4. Create a third node.
    • In Run node ribbon find a Run DataStage job and drag it on your canvas.
    • In Input find an icon next to DataStage job and click it to select the resource of DataStage job.
    • In Input>Edit local parameters find a MAX_NUMBER field and type int(ds.Ereplace(ds.GetCommandOutput(tasks.run_bash_script),ds.FM(),"")) to get the most recent value in the database, which meets the requirements.