Technical tip: Automate IBM InfoSphere DataStage jobs using CLI options in shell/batch scripts

IBM® InfoSphere® DataStage® integrates data across multiple high volume data sources and target applications. It integrates data on demand across many systems via a high performance parallel framework, extended metadata management, and enterprise connectivity. This technical tip introduces the dsjob command provided by IBM InfoSphere DataStage to invoke and monitor jobs manually and shows how to use it in automation scripts. A sample automation script will help you to use the dsjob command to run a job, view or reset job status, stop the job, and also retrieve the log for the latest job run.

Share:

Mohammed Yacoob (moyacoob@in.ibm.com), Staff Software Engineer, IBM

Mohammed Yacoob is a Staff Software Engineer with over eight years of experience at IBM and has more than ten years of IT industry experience. During the initial years, he worked on a variety of C++ based applications on multiple platforms including Windows, Linux, UNIX, and z/OS. For the past three years, he has been with the IBM Platform Technology Centre and has gained broad knowledge by working on products such as Tivoli Workload Scheduler, InfoSphere DataStage and QualityStage, and Unica Campaign. He is also focused on assessment and porting of IBM products to Linux on System Z and other environments.



15 March 2012

Introduction

While well-known as a market-leading ETL platform, InfoSphere DataStage is really designed as a transformation engine that allows you to design data flow logic that can be deployed as traditional batch ETL processes, or as real-time operations that can be used in processes and transactions. You can use InfoSphere DataStage and QualityStage Designer to create, manage and design DataStage jobs, and also use InfoSphere DataStage and QualityStage Director Client to validate, schedule, run, and monitor those jobs that are run by the InfoSphere Information Server engine.

This tip assumes that you have basic knowledge of DataStage. It is intended for technical users for reuse in their scripts and for end-users who want to use it directly or don't want to install InfoSphere DataStage and QualityStage® Director Client to run and monitor DataStage jobs.

DataStage command line interface (CLI)

Apart from the Director Client interface, the InfoSphere DataStage also provides a command line interface with options to invoke DataStage jobs deployed on the IBM InfoSphere Information Server. There is a single command, dsjob, with a large range of options that allow you to:

  • Start and stop a job
  • List projects, jobs, stages, links, and parameters
  • Set an alias for a job
  • Retrieve information about job runs
  • Access log files
  • Generate report

Please refer to the IBM Information Center for detailed dsjob command options.

All output from the dsjob command is in plain text without column headings on lists, or any other sort of description. This enables the command to be used in shell or batch scripts.

A well-written script can provide good automation and act as a simple alternative to using Director Client to perform the same tasks. The script requires access to the InfoSphere Information Server Engine to execute the dsjob command. Therefore, it should be on the same system that has the InfoSphere DataStage product installed or for remote access, where the Designer/Director Client is installed.

About the sample script

The sample script is delivered in the attached zip file, dstagejob_script.zip, approximately 2.5KB in size. The zip includes two files, execute_dstagejob.sh and dstage_script.properties.

The shell script, execute_dstagejob.sh, is available in generic format and can be readily used to start, stop or retrieve log details of any DataStage job running on UNIX or Linux systems. The job information along with input parameters, if any, and output folder are specified in the properties file, dstage_script.properties, that acts as input to the script. The script then utilizes the options provided by the dsjob command to

  • Start and stop the job
  • Retrieve job status
  • Reset job status
  • Access log file

Once the script finishes, it sends out an email with the job completion status and also attaches job log in case of a failure. If necessary, you can easily modify the script to meet your individual needs.


Download

DescriptionNameSize
Sample shell script for monitoring DataStage jobdstagejob_script.zip2.49KB

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=801900
ArticleTitle=Technical tip: Automate IBM InfoSphere DataStage jobs using CLI options in shell/batch scripts
publish-date=03152012