Technical tip: Automate IBM InfoSphere DataStage jobs using CLI options in shell/batch scripts

Comments

While well-known as a market-leading ETL platform, InfoSphere DataStage is really designed as a transformation engine that allows you to design data flow logic that can be deployed as traditional batch ETL processes, or as real-time operations that can be used in processes and transactions. You can use InfoSphere DataStage and QualityStage Designer to create, manage and design DataStage jobs, and also use InfoSphere DataStage and QualityStage Director Client to validate, schedule, run, and monitor those jobs that are run by the InfoSphere Information Server engine.

This tip assumes that you have basic knowledge of DataStage. It is intended for technical users for reuse in their scripts and for end-users who want to use it directly or don't want to install InfoSphere DataStage and QualityStage® Director Client to run and monitor DataStage jobs.

DataStage command line interface (CLI)

Apart from the Director Client interface, the InfoSphere DataStage also provides a command line interface with options to invoke DataStage jobs deployed on the IBM InfoSphere Information Server. There is a single command, dsjob, with a large range of options that allow you to:

  • Start and stop a job
  • List projects, jobs, stages, links, and parameters
  • Set an alias for a job
  • Retrieve information about job runs
  • Access log files
  • Generate report

Please refer to the IBM Information Center for detailed dsjob command options.

All output from the dsjob command is in plain text without column headings on lists, or any other sort of description. This enables the command to be used in shell or batch scripts.

A well-written script can provide good automation and act as a simple alternative to using Director Client to perform the same tasks. The script requires access to the InfoSphere Information Server Engine to execute the dsjob command. Therefore, it should be on the same system that has the InfoSphere DataStage product installed or for remote access, where the Designer/Director Client is installed.

About the sample script

The sample script is delivered in the attached zip file, dstagejob_script.zip, approximately 2.5KB in size. The zip includes two files, execute_dstagejob.sh and dstage_script.properties.

The shell script, execute_dstagejob.sh, is available in generic format and can be readily used to start, stop or retrieve log details of any DataStage job running on UNIX or Linux systems. The job information along with input parameters, if any, and output folder are specified in the properties file, dstage_script.properties, that acts as input to the script. The script then utilizes the options provided by the dsjob command to

  • Start and stop the job
  • Retrieve job status
  • Reset job status
  • Access log file

Once the script finishes, it sends out an email with the job completion status and also attaches job log in case of a failure. If necessary, you can easily modify the script to meet your individual needs.


Downloadable resources


Related topics


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=801900
ArticleTitle=Technical tip: Automate IBM InfoSphere DataStage jobs using CLI options in shell/batch scripts
publish-date=03152012