DataStage jobs in pipeline flows example
With Orchestration Pipelines, you can create a pipeline to run and connect DataStage® jobs.
Example
In this example, you can find a step-by-step procedure of using DataStage flows to copy the data from the IBM Netezza Performance Server for DataStage to the Oracle Database for DataStage table. Create a pipeline with two DataStage jobs and one bash script to get the maximum value from the database. Get it from one flow, store it in a text file, and then use stored value in another flow that chooses the most current maximum value.
- Creating a storage volume connection
-
- Get familiar with the Sharing storage volumes between Orchestration Pipelines and DataStage.
- Open IBM Cloud Pack for Data. Click Navigation menu>Administration>Storage volumes and create a new storage volume.
- In Storage volume overview section choose Namespace and type Volume name.
- In Storage volume details section choose Volume type, Storage class and type a path in Mount path.
- Go to your storage volume details. In Configuration details find PVC to use and copy it.
- Go to your cluster. Add copied information to your code, for example:
spec: additional_storage: - mount_path: /mnts/tutorial pvc_name: volumes-tutorial-sample-pvc
- Creating a three-node pipeline flow by using DataStage flows, available connectors and bash script
-
Figure 1. Three-node pipeline flow - Create or Open a project.
- Create the first node, which takes data from IBM Netezza Performance Server for DataStage and send it to the sequential file.
- Click Run node ribbon and drag a Run DataStage job on your canvas.
- In Input find an icon next to DataStage job and click it to select the resource of DataStage job. Click Save.
- For this example, in MAX_NUMBER type
50000
. In the filenameparams.filename
is an environment variable.
- Create a second node.
- In the Run node ribbon find a Run Bash script and drag it on your canvas.
- For this example, in Script code type
cat $e1_filename
, which reads a file sequentially and prints it to the standard output. Pass the filename parameter (e1_filename) to bash node and add it as an environmental variable (params.filename).
- Create a third node.
- In Run node ribbon find a Run DataStage job and drag it on your canvas.
- In Input find an icon next to DataStage job and click it to select the resource of DataStage job.
- In Input>Edit local parameters find a
MAX_NUMBER field and type
int(ds.Ereplace(ds.GetCommandOutput(tasks.run_bash_script),ds.FM(),""))
to get the most recent value in the database, which meets the requirements.