Technical tip: Automate IBM InfoSphere DataStage jobs using CLI options in shell/batch scripts
While well-known as a market-leading ETL platform, InfoSphere DataStage is really designed as a transformation engine that allows you to design data flow logic that can be deployed as traditional batch ETL processes, or as real-time operations that can be used in processes and transactions. You can use InfoSphere DataStage and QualityStage Designer to create, manage and design DataStage jobs, and also use InfoSphere DataStage and QualityStage Director Client to validate, schedule, run, and monitor those jobs that are run by the InfoSphere Information Server engine.
This tip assumes that you have basic knowledge of DataStage. It is intended for technical users for reuse in their scripts and for end-users who want to use it directly or don't want to install InfoSphere DataStage and QualityStage® Director Client to run and monitor DataStage jobs.
DataStage command line interface (CLI)
Apart from the Director Client interface, the InfoSphere DataStage also provides a command line
interface with options to invoke DataStage jobs deployed on the IBM InfoSphere Information Server.
There is a single command,
dsjob, with a large
range of options that allow you to:
- Start and stop a job
- List projects, jobs, stages, links, and parameters
- Set an alias for a job
- Retrieve information about job runs
- Access log files
- Generate report
Please refer to the
IBM Information Center for detailed
dsjob command options.
All output from the
dsjob command is in plain text without column
headings on lists, or any other sort of description. This enables the command to be used in shell
or batch scripts.
A well-written script can provide good automation and act as a simple alternative to using
Director Client to perform the same tasks. The script requires access to the InfoSphere Information
Server Engine to execute the
dsjob command. Therefore, it should be on
the same system that has the InfoSphere DataStage product installed or for remote access, where
the Designer/Director Client is installed.
About the sample script
The sample script is delivered in the attached zip file, dstagejob_script.zip, approximately 2.5KB in size. The zip includes two files, execute_dstagejob.sh and dstage_script.properties.
The shell script, execute_dstagejob.sh, is available in generic format and can be readily used
to start, stop or retrieve log details of any DataStage job running on UNIX or Linux systems.
The job information along with input parameters, if any, and output folder are specified in the
properties file, dstage_script.properties, that acts as input to the script. The script then
utilizes the options provided by the
dsjob command to
- Start and stop the job
- Retrieve job status
- Reset job status
- Access log file
Once the script finishes, it sends out an email with the job completion status and also attaches job log in case of a failure. If necessary, you can easily modify the script to meet your individual needs.
- For detailed information on InfoSphere DataStage jobs, see the IBM Information Center.
- For documentation on InfoSphere Information Server V8.5 and its components, see the IBM Information Center or refer to the PDF documentation.
- Visit the developerWorks InfoSphere resource page to read articles and tutorials on the InfoSphere family of products and connect to other resources to expand your skills.