Configuring and using IBM InfoSphere DataStage and QualityStage Operations Console in a multiple-node or grid environment

A detailed description of how to configure and view multiple node information in the Operations Console

The IBM® InfoSphere® DataStage® and QualityStage® Operations Console is a web application that allows the DataStage engine components of an Information Server installation to be monitored in real time. This provides a complete view of all DataStage job runs on that system, both present time and historic. It also includes the monitoring of key operating system metrics, such as CPU usage, free memory, and disk space. These metrics can be gathered for all nodes in a multiple-node or grid environment. This article describes how to configure the Operations Console to gather the metrics for all nodes and how to view them within the console UI.

Share:

Geoff McClean (geoff.mcclean@uk.ibm.com), Senior Software Developer, IBM

Geoff McClean was on the original DataStage development team at its inception, and is currently a senior software developer for core components of the IBM InfoSphere DataStage and QualityStage development and production tools, part of the IBM InfoSphere Information Server suite. He oversaw the implementation of the database management, event handling, and resource tracking services of the IBM InfoSphere DataStage and QualityStage Operations Console.



Len Greenwood (len.greenwood@uk.ibm.com), DataStage Core Architect, IBM

Len GreenwoodLen Greenwood was a member of the small development team that produced the first version of DataStage in 1996, prior to it being acquired from Ascential Software by IBM in 2005. It now forms a mainstay of the IBM InfoSphere Information Server suite. He has worked in the related areas of data and metadata integration for the past 15 years and is currently the main product architect for the core components of the DataStage and QualityStage development and production tools. He recently designed the database schema that underlies the Information Server Operations Console, used to monitor activity at the DataStage engine level.



Arron Harden (arron.harden@uk.ibm.com), Senior Software Engineer, IBM

Arron HardenArron Harden is a senior software engineer for IBM InfoSphere DataStage and QualityStage. Staying with the DataStage product after several mergers and acquisitions, he has worked on DataStage for more than 12 years, joining IBM through the acquisition of Ascential Software Inc in 2005. Having spent a year working in Boston, he is currently based in the United Kingdom, working at the IBM Milton Keynes office. In his most recent role, he was the lead developer for the web application component of the DataStage and QualityStage Operations Console, written using the Dojo toolkit.



Eric Jacobson (ejacobso@us.ibm.com), Senior Software Engineer, IBM

Eric Jacobson is a senior software engineer for the Parallel Engine group within the IBM InfoSphere DataStage and Quality Stage product. He has been a major contributor to the Parallel Engine since 2003 and eventually joined IBM through its acquisition of Ascential Software in 2005. He has made major contributions to the Parallel Engine framework in areas, including lookup, transform, and import while also focusing on performance. Currently, he is working on the integration of DataStage and Hadoop, recently delivering the first phase of this, which enables reading and writing of files on the Hadoop Distributed File System through the new Big Data File Stage.



09 August 2012

Also available in Portuguese

Configuration overview

By default, operating system metrics are gathered for the DataStage engine node, on which the DataStage jobs run. These results can then be viewed through the Operations Console UI. On a system where jobs run on a single DataStage engine node, this default case will cater for all the relevant system metrics when running the job. In some environments, jobs are run across multiple nodes — in a grid environment, for example. In this case, a user may also want to see system metrics for each remote node on which jobs run. For these to be available to the Operations Console, it needs to be configured so system metrics for other nodes are gathered and stored in the operations database.

All the configuration parameters for gathering system metrics are specified in the DSODBConfig.cfg file, which is installed in the main DSODB directory on the DataStage engine system (.../IBM/InformationServer/Server/DSODB).

Online reference material for DSODB as of V8.7, including descriptions of current tables and columns, can be found in the schema document reference in the Resources section.

Controlling system metrics gathering

Before going into the individual configuration parameters for system metrics, there is a controlling configuration parameter that allows a user to turn off all system metrics gathering. This can be used, for example, if a user wants to use some system other than Operations Console to gather system metrics. However, this means system metrics could not be viewed through Operations Console. By default, this parameter is set to collect system metrics.

Listing 1. Enabling resource monitoring
# System Resource Monitor - enable/disable
# ========================================
# The following switches on the collection of system resource data if set to 1
# (the default), or switches it off if 0. If set to 0, all options below related to 
# resource tracking are ignored.
# ResourceMonitor=1

Remote nodes

For system metrics to be gathered for any node other than the DataStage engine node, they must be specified in this configuration file.

Listing 2. Specifying node names
# The following specifies the name of a remote node whose resources are to be monitored.
# (The local system is always monitored if the resource tracker is running.)
# The name given for each node should match that used in Parallel Job config files.
# This property can be repeated any number of times to include multiple remote nodes.
# ResourceNode=xxxxxx

For each remote node to be monitored, a new line must be added, specifying the host name of the node. That name must match the fastname used in the parallel job configuration files for that node. The property can be specified any number of times, once for each remote node. For example: To gather system metrics for three remote nodes (node1, node2, node3), the following properties should be specified.

Listing 3. Specifying multiple node names
ResourceNode=node1 
ResourceNode=node2 
ResourceNode=node3

Note that the node name specified is case-insensitive.

Local and remote file systems

By default, the system metrics gathered do not include any disk space information. A user can, however, specify any local or remote disks for which to check for disk space. With these options set, disk space is included in the Operations Console display. The property used to specify local file systems to monitor is as follows.

Listing 4. Specifying local file systems
# The following specifies a locally mounted file system to be monitored.
# This property can be repeated any number of times to specify multiple file systems.
# ResourceLocalFS=/localfilesystemA

Any number of local file systems can be specified, with the property repeated for each. An example to monitor the local file system containing /tmp would be: ResourceLocalFS=/tmp.

For a Windows system, the file system pathname can be specified with forward or backward slashes, and can optionally contain a disk prefix ('C:\tmp', for example).

Similar to local file systems, remote file systems can be monitored.

Listing 5. Specifying remote file systems
# The following specifies a file system mounted on a remote node to be monitored.
# The remote node name must match that specified in the corresponding ResourceNode
# entry above.
# This property can be repeated any number of times to specify multiple file systems.
# ResourceRemoteFS=node1+/remotefilesystem

This property is specified as two parts, separated by a +. The first part specifies the node on which the file system exists; the node name specified must have a corresponding RemoteNode property entry with the same name. The second part is the path of the file system on that remote node to monitor; can be specified any number of times, once for each file system to monitor. For example, the following requests to monitor two file systems on remote node node1 and one on remote node node2.

Listing 6. Specifying multiple remote file systems
ResourceRemoteFS=node1+/usr
ResourceRemoteFS=node1+/tmp
ResourceRemoteFS=node2+/tmp

As with local file system specifications, a Windows path can be specified with forward or backward slashes and an optional disk prefix, such as: ResourceRemoteFS=node1+C:\tmp.

Remote node setup

It is not possible to specify just any remote node from which to collect system resource information. A remote node must have been set up with the Information Server parallel engine such that parallel jobs can be run on that node. As with running parallel jobs on multiple nodes, all nodes should be of a similar platform type.

For remote node resource information to be tied to its corresponding parallel job run, the node name specified in the DSODBConfig.cfg file for a remote node must be the same as the name used in a parallel job configuration file for that run.

ResTrackApp port numbers

System resource information is gathered by a process called ResTrackApp. ResTrackApp runs on the DataStage engine node and listens on a port waiting for Operations Console components to request information from it. When remote nodes are requested, it is this ResTrackApp process that connects to the remote nodes to obtain system resource information. To do so, it connects to a version of itself on each remote node, connecting via a port number.

Both the local and remote port numbers used by ResTrackApp have the same default value of 13450. However, a user can specify different port numbers for the ResTrackApp running on the DataStage engine node or the connections to the remote nodes. These properties in the configuration file are as follows.

Listing 7. Specifying resource tracker port numbers
# Resource Tracking - connections
# ===============================
# The following specifies the port number that the resource tracking application
# (ResTrackApp) will use on the local system. The default is 13450.
# ResourcePortNum=13450
                
# The following specifies the port number that the resource tracking application 
# (ResTrackApp) will use on all remote nodes. The default is 13450.
# ResourceRemotePortNum=13450

Specifying a port number of 13800 for the DataStage engine and 13801 for remote connections would be done as shown below.

Listing 8. Specifying non-default resource tracker port numbers
ResourcePortNum=13800
                
ResourceRemotePortNum=13801

UI elements

System resource information can be viewed in several places within the Operations Console UI. In some places, such as on the home page, resource information is shown for one node at a time, showing the DataStage engine node information by default. In other places, such as the activity resource page, resource information for multiple nodes can be displayed simultaneously.

On a system where resource information is only being collected for the DataStage engine node, the UI does not offer any choice of other nodes. If information for more than one node is being collected, the UI will show the node being displayed and include drop-down lists that allow the user to select the node for which to show information.

Home page operating system resources

By default, the Operations Console home page shows operating system resource for the DataStage engine node. If node information is being collected for multiple nodes, the Operating System Resources section will show the current node being viewed in a drop-down control in the type right-hand corner of that area. The following example shows a case where the node being shown is MK-Engine, which is also the DataStage engine machine, as denoted in the parentheses (Figure 1).

Figure 1. Home page compute node selector
Image shows the Operating System Resources graphs with Compute Node button (labelled 'MK-Engine (Engine)') and drop-down icon

Selecting the drop-down will offer a list of nodes to choose from.

Figure 2. Home page compute node selector drop-down
Image shows Compute Node button pressed to display a drop-down list with 'MK-Engine (Engine)' above 'IBM-Node1'

Job run performance

Viewing the details of a specific job run shows system resource details during that run. If a job has been configured to run on multiple nodes, when showing job run details, the user has an option to choose the appropriate node to show details for. The graph can only show details for one node at any one time. In the section where the user can select which system resources to show, there will be a drop-down showing the selected node and allowing the user to change the selection. The example below shows the node selection box with system resources being shown for the remote node IBM-Node1.

Figure 3. Job performance compute node selector
Image shows Job Run Details pane, Performance tab, with Compute Node button

Job activity resources

The top-level Activity tab includes a Resources tab that can be used to show selected system resources during the specified activity timeframe. The user can choose to show graphs of CPU, disk space, job runs, memory, and processes. For these, if data has been collected for more than just the DataStage engine node, the selection menu for each category will allow the user to select from the list of nodes that have data. For example, selecting the CPU graph for the engine node and another node will display the CPU graphs for both. The example below shows the menu selection to show CPU data for both the engine node MK-Engine and the remote node IBM-Node1, with both graphs being displayed.

Figure 4. Activity resources compute node selector
Image shows Activity pane, Resources tab

Troubleshooting

The UI does not show the node selection drop-down and currently selected node.

The drop-down or remote node selection menus will only appear if data is being collected for more than just the DataStage engine node. If a remote node has been configured, but the selector is still not shown, then there is a problem with the configuration of that node. See the section below.

I added ResourceNode entry in the config file but the node does not appear in the resource node selection drop-down or menus in the UI.

For a node to appear in the list, it must be configured correctly. You should check that:

  • The DSODBConfig.cfg file ResourceNode=xxx entry for the node is specified correctly (case-insensitive).
  • The DataStage PXEngine (Information Server 8.7 or higher) has been set up correctly on that remote node.
  • The remote node is the same platform type as the engine node.
  • The remote port number (ResourceRemotePortNum entry in the DSODBConfig.cfg file) is not already in use on that node.

I added ResourceRemoteFS entry in the config file, but the remote file system does not appear on the resource menus in the UI.

For a remote file system to appear in the resource menus, it must be configured correctly. You should check that:

  • The DSODBConfig.cfg file ResourceRemoteFS entry has a corresponding ResourceNode entry, where the node name is specified exactly the same in both entries.
  • Resource information is being collected for the remote node (see troubleshooting entry above). If resource information is not being collected, disk information will not be collected.
  • The specified path actually exists on the remote node.

I configured remote node in the DSODBConfig.cfg file and ran job configured to run on that remote node, but when viewing job run details, Performance tab drop-down offers that remote node, but when I select to display resource information, UI says "No data available."

Verify that the remote node name specified in the DSODBConfig.cfg matches the remote node name in the parallel job configuration file the job was run with. The node names in these two configuration files must match to tie a job run to its corresponding remote system resources.


Conclusion

In this article, we have described the configuration parameters that must be set in order to collect local and remote node system resource information. We have also described how this system information can be selected and displayed in the Operations Console UI.

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.
  • Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edition and provides a solid base to build and deploy applications.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=829488
ArticleTitle= Configuring and using IBM InfoSphere DataStage and QualityStage Operations Console in a multiple-node or grid environment
publish-date=08092012