Configuring and using IBM InfoSphere DataStage and QualityStage Operations Console in a multiple-node or grid environment
A detailed description of how to configure and view multiple node information in the Operations Console
By default, operating system metrics are gathered for the DataStage engine node, on which the DataStage jobs run. These results can then be viewed through the Operations Console UI. On a system where jobs run on a single DataStage engine node, this default case will cater for all the relevant system metrics when running the job. In some environments, jobs are run across multiple nodes — in a grid environment, for example. In this case, a user may also want to see system metrics for each remote node on which jobs run. For these to be available to the Operations Console, it needs to be configured so system metrics for other nodes are gathered and stored in the operations database.
All the configuration parameters for gathering system metrics are specified in the DSODBConfig.cfg file, which is installed in the main DSODB directory on the DataStage engine system (.../IBM/InformationServer/Server/DSODB).
Online reference material for DSODB as of V8.7, including descriptions of current tables and columns, can be found in the schema document reference in the Related topics section.
Controlling system metrics gathering
Before going into the individual configuration parameters for system metrics, there is a controlling configuration parameter that allows a user to turn off all system metrics gathering. This can be used, for example, if a user wants to use some system other than Operations Console to gather system metrics. However, this means system metrics could not be viewed through Operations Console. By default, this parameter is set to collect system metrics.
Listing 1. Enabling resource monitoring
# System Resource Monitor - enable/disable # ======================================== # The following switches on the collection of system resource data if set to 1 # (the default), or switches it off if 0. If set to 0, all options below related to # resource tracking are ignored. # ResourceMonitor=1
For system metrics to be gathered for any node other than the DataStage engine node, they must be specified in this configuration file.
Listing 2. Specifying node names
# The following specifies the name of a remote node whose resources are to be monitored. # (The local system is always monitored if the resource tracker is running.) # The name given for each node should match that used in Parallel Job config files. # This property can be repeated any number of times to include multiple remote nodes. # ResourceNode=xxxxxx
For each remote node to be monitored, a new line must be added, specifying the host name of the node. That name must match the fastname used in the parallel job configuration files for that node. The property can be specified any number of times, once for each remote node. For example: To gather system metrics for three remote nodes (node1, node2, node3), the following properties should be specified.
Listing 3. Specifying multiple node names
ResourceNode=node1 ResourceNode=node2 ResourceNode=node3
Note that the node name specified is case-insensitive.
Local and remote file systems
By default, the system metrics gathered do not include any disk space information. A user can, however, specify any local or remote disks for which to check for disk space. With these options set, disk space is included in the Operations Console display. The property used to specify local file systems to monitor is as follows.
Listing 4. Specifying local file systems
# The following specifies a locally mounted file system to be monitored. # This property can be repeated any number of times to specify multiple file systems. # ResourceLocalFS=/localfilesystemA
Any number of local file systems can be specified, with the property repeated for
each. An example to monitor the local file system containing /tmp would
For a Windows system, the file system pathname can be specified with forward or backward slashes, and can optionally contain a disk prefix ('C:\tmp', for example).
Similar to local file systems, remote file systems can be monitored.
Listing 5. Specifying remote file systems
# The following specifies a file system mounted on a remote node to be monitored. # The remote node name must match that specified in the corresponding ResourceNode # entry above. # This property can be repeated any number of times to specify multiple file systems. # ResourceRemoteFS=node1+/remotefilesystem
This property is specified as two parts, separated by a
+. The first
part specifies the node on which the file system exists; the node name specified
must have a corresponding RemoteNode property entry with the same name. The second
part is the path of the file system on that remote node to monitor; can be
specified any number of times, once for each file system to monitor.
For example, the following requests to monitor two file systems on remote node node1
and one on remote node node2.
Listing 6. Specifying multiple remote file systems
ResourceRemoteFS=node1+/usr ResourceRemoteFS=node1+/tmp ResourceRemoteFS=node2+/tmp
As with local file system specifications, a Windows path can be specified with
forward or backward slashes and an optional disk prefix, such as:
Remote node setup
It is not possible to specify just any remote node from which to collect system resource information. A remote node must have been set up with the Information Server parallel engine such that parallel jobs can be run on that node. As with running parallel jobs on multiple nodes, all nodes should be of a similar platform type.
For remote node resource information to be tied to its corresponding parallel job run, the node name specified in the DSODBConfig.cfg file for a remote node must be the same as the name used in a parallel job configuration file for that run.
ResTrackApp port numbers
System resource information is gathered by a process called ResTrackApp. ResTrackApp runs on the DataStage engine node and listens on a port waiting for Operations Console components to request information from it. When remote nodes are requested, it is this ResTrackApp process that connects to the remote nodes to obtain system resource information. To do so, it connects to a version of itself on each remote node, connecting via a port number.
Both the local and remote port numbers used by ResTrackApp have the same default value of 13450. However, a user can specify different port numbers for the ResTrackApp running on the DataStage engine node or the connections to the remote nodes. These properties in the configuration file are as follows.
Listing 7. Specifying resource tracker port numbers
# Resource Tracking - connections # =============================== # The following specifies the port number that the resource tracking application # (ResTrackApp) will use on the local system. The default is 13450. # ResourcePortNum=13450 # The following specifies the port number that the resource tracking application # (ResTrackApp) will use on all remote nodes. The default is 13450. # ResourceRemotePortNum=13450
Specifying a port number of 13800 for the DataStage engine and 13801 for remote connections would be done as shown below.
Listing 8. Specifying non-default resource tracker port numbers
System resource information can be viewed in several places within the Operations Console UI. In some places, such as on the home page, resource information is shown for one node at a time, showing the DataStage engine node information by default. In other places, such as the activity resource page, resource information for multiple nodes can be displayed simultaneously.
On a system where resource information is only being collected for the DataStage engine node, the UI does not offer any choice of other nodes. If information for more than one node is being collected, the UI will show the node being displayed and include drop-down lists that allow the user to select the node for which to show information.
Home page operating system resources
By default, the Operations Console home page shows operating system resource for the DataStage engine node. If node information is being collected for multiple nodes, the Operating System Resources section will show the current node being viewed in a drop-down control in the type right-hand corner of that area. The following example shows a case where the node being shown is MK-Engine, which is also the DataStage engine machine, as denoted in the parentheses (Figure 1).
Figure 1. Home page compute node selector
Selecting the drop-down will offer a list of nodes to choose from.
Figure 2. Home page compute node selector drop-down
Job run performance
Viewing the details of a specific job run shows system resource details during that run. If a job has been configured to run on multiple nodes, when showing job run details, the user has an option to choose the appropriate node to show details for. The graph can only show details for one node at any one time. In the section where the user can select which system resources to show, there will be a drop-down showing the selected node and allowing the user to change the selection. The example below shows the node selection box with system resources being shown for the remote node IBM-Node1.
Figure 3. Job performance compute node selector
Job activity resources
The top-level Activity tab includes a Resources tab that can be used to show selected system resources during the specified activity timeframe. The user can choose to show graphs of CPU, disk space, job runs, memory, and processes. For these, if data has been collected for more than just the DataStage engine node, the selection menu for each category will allow the user to select from the list of nodes that have data. For example, selecting the CPU graph for the engine node and another node will display the CPU graphs for both. The example below shows the menu selection to show CPU data for both the engine node MK-Engine and the remote node IBM-Node1, with both graphs being displayed.
Figure 4. Activity resources compute node selector
The UI does not show the node selection drop-down and currently selected node.
The drop-down or remote node selection menus will only appear if data is being collected for more than just the DataStage engine node. If a remote node has been configured, but the selector is still not shown, then there is a problem with the configuration of that node. See the section below.
I added ResourceNode entry in the config file but the node does not appear in the resource node selection drop-down or menus in the UI.
For a node to appear in the list, it must be configured correctly. You should check that:
- The DSODBConfig.cfg file ResourceNode=xxx entry for the node is specified correctly (case-insensitive).
- The DataStage PXEngine (Information Server 8.7 or higher) has been set up correctly on that remote node.
- The remote node is the same platform type as the engine node.
- The remote port number (ResourceRemotePortNum entry in the DSODBConfig.cfg file) is not already in use on that node.
I added ResourceRemoteFS entry in the config file, but the remote file system does not appear on the resource menus in the UI.
For a remote file system to appear in the resource menus, it must be configured correctly. You should check that:
- The DSODBConfig.cfg file ResourceRemoteFS entry has a corresponding ResourceNode entry, where the node name is specified exactly the same in both entries.
- Resource information is being collected for the remote node (see troubleshooting entry above). If resource information is not being collected, disk information will not be collected.
- The specified path actually exists on the remote node.
I configured remote node in the DSODBConfig.cfg file and ran job configured to run on that remote node, but when viewing job run details, Performance tab drop-down offers that remote node, but when I select to display resource information, UI says "No data available."
Verify that the remote node name specified in the DSODBConfig.cfg matches the remote node name in the parallel job configuration file the job was run with. The node names in these two configuration files must match to tie a job run to its corresponding remote system resources.
In this article, we have described the configuration parameters that must be set in order to collect local and remote node system resource information. We have also described how this system information can be selected and displayed in the Operations Console UI.
- The Operations Console functionality is publicly documented in the IBM InfoCenter site.
- Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edition and provides a solid base to build and deploy applications.