Interpret IBM InfoSphere DataStage and QualityStage Operations Console metrics

A deep dive into the metrics and statistics presented in the Operations Console

This article looks into the metrics and statistics presented in the IBM® InfoSphere® DataStage® and QualityStage® Operations Console. This includes descriptions of how the values are generated (or where they are read from) and how the calculations are made. Where appropriate, example SQL queries are shown to demonstrate how the information is read from the operations database.

Arron Harden (arron.harden@uk.ibm.com), Senior Software Engineer, IBM

Arron HardenArron Harden is a senior software engineer for IBM InfoSphere DataStage and QualityStage. Staying with the DataStage product after several mergers and acquisitions, he has worked on DataStage for more than 12 years, joining IBM through the acquisition of Ascential Software Inc in 2005. Having spent a year working in Boston, he is currently based in the United Kingdom, working at the IBM Milton Keynes office. In his most recent role, he was the lead developer for the web application component of the DataStage and QualityStage Operations Console, written using the Dojo toolkit.



Len Greenwood (len.greenwood@uk.ibm.com), DataStage Core Architect, IBM

Len GreenwoodLen Greenwood was a member of the small development team that produced the first version of DataStage in 1996, prior to it being acquired from Ascential Software by IBM in 2005. It now forms a mainstay of the IBM InfoSphere Information Server suite. He has worked in the related areas of data and metadata integration for the past 15 years and is currently the main product architect for the core components of the DataStage and QualityStage development and production tools. He recently designed the database schema that underlies the Information Server Operations Console, used to monitor activity at the DataStage engine level.



Geoff McClean (geoff.mcclean@uk.ibm.com), Senior Software Developer, IBM

Geoff McCleanGeoff McClean was on the original DataStage development team at its inception and is currently a senior software developer for core components of the IBM InfoSphere DataStage and QualityStage development and production tools, part of the IBM InfoSphere Information Server suite. He oversaw the implementation of the database management, event handling, and resource tracking services of the IBM InfoSphere DataStage and QualityStage Operations Console.



Julian Vizor (julian.vizor@uk.ibm.com), Senior Development Manager, IBM

Author Photo of Julian VizorJulian Vizor has been managing the development of the core IBM InfoSphere DataStage product components in the UK for over 10 years. Julian joined the DataStage team soon after its inception, initially as a developer, then taking responsibility for managing the development activities, through all evolutions of the product. As senior manager in the Milton Keynes lab, Julian continues to lead the development team and was recently responsible for the delivery of the DataStage & QualityStage Operations Console project, as part of the InfoSphere Information Server 8.7 release.



Eric Jacobson (ejacobso@us.ibm.com), Senior Software Engineer, IBM

Author Photo of Eric JacobsonEric Jacobson is a senior software engineer for the Parallel Engine group within the IBM InfoSphere DataStage and Quality Stage product. He has been a major contributor to the Parallel Engine since 2003 and eventually joined IBM through its acquisition of Ascential Software in 2005. He has made major contributions to the Parallel Engine framework in areas, including lookup, transform, and import while also focusing on performance. Currently, he is working on the integration of DataStage and Hadoop, recently delivering the first phase of this, which enables reading and writing of files on the Hadoop Distributed File System through the new Big Data File Stage.



13 September 2012

Also available in Chinese

Introduction

The IBM InfoSphere DataStage and QualityStage Operations Console is a web application that allows the DataStage engine components of an InfoSphere Information Server installation to be monitored in real time. This provides a complete view of all DataStage job runs on that system, both present time and historic. It also includes the monitoring of key operating system metrics such as CPU usage, free memory and disk space.

The following article describes in detail the metrics and properties displayed in the Operations Console, and provides information about where the data is collected from. It assumes the reader is familiar with the Operations Console and DataStage in general. The information presented here is related to InfoSphere Information Server version 8.7.

Data sources

The information displayed in the Operations Console is read from the following sources.

  • Metadata repository (XMETA) – This is the design time metadata repository, used primarily by the DataStage Designer to store table definitions, job designs, and so on.
  • User registry – Depending on how the system is configured, this can be an internal registry where all the user information is stored in the metadata repository, or it can be an external registry such as LDAP, where the user information is stored in an external directory system.
  • DSEngine – This is the run time engine and repository, used primarily by the DataStage Director and jobs when they are run. It is also used to store the executable version of a job design when it is compiled by the DataStage Designer.
  • Operations database (DSODB) – This is the operational repository that stores information about job runs. Once enabled, whenever a job is run, data about that run is stored here. The operations database uses a publicly documented open schema, and since it is an open schema, the relevant SQL queries that extracts the information is shown in this article (where applicable). Note that for clarity, only DB2 SQL queries have been shown, without pagination or sorting.

Dates and times

All dates and times displayed in the Operations Console are converted and displayed in the time zone of the browser. For example, if the engine tier system time zone is UTC-0500, and the browser is UTC+0000, an event that happens on 3rd November 2011 at 4:00pm local time on the engine is shown in the browser as occurring on 3rd November 2011 at 9:00pm. Similarly, if a different browser is used and is UTC+0500, that same event is shown on 4th November 2011 at 2:00am on that browser.

The home page of the Operations Console shows the time zone, including the UTC offset, for both the engine tier system and the browser, shown as local in Table 1.

Table 1. Time zones
General properties
Engine started:Oct 25, 2011 15:46:05 (local time zone). Engine is UTC+0000 (GMT), local is UTC+0000
Operating System:Linux: 2.6.32.45-0.3-pae
CPU:2x Intel(R) Pentium(R)4 CPU 3.20GHz
Memory:2.8GB Physical - 2.01GB Virtual

Within the operations database itself all timestamps are stored in UTC, regardless of the time zone of the engine system.

It is important to note that some timestamps are created using the clock on the engine tier system and some are created using the clock on the services tier system. If these are different physical systems and the clocks are significantly out of synchronization, there may be discrepancies when examining and comparing timestamps.


Operating system monitoring

The Operations Console displays information about various operating system metrics such as CPU usage, and free memory. The attributes associated with each of these metrics, and where the information is ultimately read from, is explained later in this article.

Note that in the following SQL query examples, if the engine has been configured with remote node information (via the ResourceNode property in DSODBConfig.cfg), the information read from the operations database can either be the engine itself (that is <remoteNodeName> = <engineName>), or for a particular remote node.

CPU – total usage chart

The CPU total usage chart is displayed on the home page, and can also be optionally displayed on the job run details window and the activity page. For example, total CPU usage from the activity page is shown in Figure 1.

Figure 1. CPU total usage chart
Chart showing total CPU usage

The base information for total CPU usage is read from the operations database using the SQL query shown in Listing 1.

Listing 1. SQL example: CPU total usage
SELECT
    X.CPUPCTIDLEAVG, X.STARTTIMESTAMP, X.ENDTIMESTAMP
FROM
    DSODB.RESOURCEUSAGESYSTEM X,
    DSODB.HOST Y1,
    DSODB.HOST Y2
WHERE
    (X.HOSTID = Y1.HOSTID AND Y1.HOSTNAME = <remoteNodeName>) AND
    (X.HEAD_HOSTID = Y2.HOSTID AND Y2.HOSTNAME = <engineName>) AND
    (
        (X.STARTTIMESTAMP >= <startTimeStampUTC> AND
            X.STARTTIMESTAMP <= <endTimeStampUTC>) OR
        (X.ENDTIMESTAMP >= <startTimeStampUTC> AND
            X.ENDTIMESTAMP <= <endTimeStampUTC>) OR
        (X.STARTTIMESTAMP < <startTimeStampUTC> AND 
            X.ENDTIMESTAMP > <endTimeStampUTC>)
    )

The CPU idle percentage returned by the previous SQL query is converted to CPU usage percentage by using the calculation 100% - CPU idle time. For this field, 100% represents the total possible across all processing units on the system. So for example if a system has 4 CPUs, and 1 of those CPUs was completely in use, the idle time would be 75% (and therefore the total usage would be 25%).

The information in the operations database returned by the previous query is written by the ResMonApp and ResTrackApp services, which obtain it using the following methods.

  • Linux - The CPU idle percentage is calculated using the idle entry in the CPU field in /proc/stat. This idle value is subtracted from the previously recorded value, divided by the sum of all the other entries in the CPU field and multiplied by 100 to obtain the percentage value.
  • AIX - The CPU idle percentage is calculated using the sum of the idle entries returned by the perfstat_cpu() system call. This idle value is subtracted from the previously recorded value, divided by the sum of all the other entries in the CPU field and multiplied by 100 to obtain the percentage value.
  • Solaris - The CPU idle percentage is calculated using the sum of all the idle entries in the cpu_stat modules in the kstat control structure returned by the kstat_*() system calls. This idle value is subtracted from the previously recorded value, divided by the sum of all the other entries in the CPU field and multiplied by 100 to obtain the percentage value.
  • Windows - The CPU idle percentage is calculated using the idle value returned by the Windows PerformanceCounter class with a counter class of Processor and counter name of % Idle Time.

CPU – system usage chart

The CPU system usage chart can optionally be displayed on the job run details window, as shown in Figure 2.

Figure 2. CPU system usage chart
Chart showing system CPU usage

The base information for system or privileged CPU usage is read from the operations database using the SQL query shown in Listing 2.

Listing 2. SQL example: CPU system usage
SELECT
    X.CPUPCTPRIVILEGEDAVG, X.STARTTIMESTAMP, X.ENDTIMESTAMP
FROM
    DSODB.RESOURCEUSAGESYSTEM X,
    DSODB.HOST Y1,
    DSODB.HOST Y2
WHERE
    (X.HOSTID = Y1.HOSTID AND Y1.HOSTNAME = <remoteNodeName>) AND
    (X.HEAD_HOSTID = Y2.HOSTID AND Y2.HOSTNAME = <engineName>) AND
    (
        (X.STARTTIMESTAMP >= <startTimeStampUTC> AND
            X.STARTTIMESTAMP <= <endTimeStampUTC>) OR
        (X.ENDTIMESTAMP >= <startTimeStampUTC> AND
            X.ENDTIMESTAMP <= <endTimeStampUTC>) OR
        (X.STARTTIMESTAMP < <startTimeStampUTC> AND 
            X.ENDTIMESTAMP > <endTimeStampUTC>)
    )

For CPU system usage percentage, 100% represents the total possible across all processing units on the system, so for example if a system has 4 CPUs, and 1 of those CPUs was completely in use by the operating system kernel, the system usage would be 25%.

The information in the operations database returned by the above query is written by the ResMonApp and ResTrackApp services, which obtain it using the following methods.

  • Linux - The CPU privileged percentage is calculated using the privileged entry in the CPU field in /proc/stat. This privileged value is subtracted from the previously recorded value, divided by the sum of all the other entries in the CPU field and multiplied by 100 to obtain the percentage value.
  • IX - The CPU privileged percentage is calculated using the sum of the privileged entries returned by the perfstat_cpu() system call. This privileged value is subtracted from the previously recorded value, divided by the sum of all the other entries in the CPU field and multiplied by 100 to obtain the percentage value.
  • Solaris - The CPU privileged percentage is calculated using the sum of all the privileged entries in the cpu_stat modules in the kstat control structure returned by the kstat_*() system calls. This privileged value is subtracted from the previously recorded value, divided by the sum of all the other entries in the CPU field and multiplied by 100 to obtain the percentage value.
  • Windows - The CPU privileged percentage is calculated using the value returned by the Windows PerformanceCounter class with a counter class of Processor, and counter name of % Privileged Time.

Memory - physical free chart

The free physical memory chart is displayed on the home page, and can also be optionally displayed on the job run details window and the activity page. For example, from the activity page, physical memory is the blue dashed line, as shown in Figure 3.

Figure 3. Free physical memory chart
Chart showing free physical memory

The base information for free physical memory is read from the operations database using the SQL query shown in Listing 3.

Listing 3. SQL example: Free physical memory
SELECT
    X.MEMFREEKBPHYSICALAVG, X.STARTTIMESTAMP, X.ENDTIMESTAMP
FROM
    DSODB.RESOURCEUSAGESYSTEM X,
    DSODB.HOST Y1,
    DSODB.HOST Y2
WHERE
    (X.HOSTID = Y1.HOSTID AND Y1.HOSTNAME = <remoteNodeName>) AND
    (X.HEAD_HOSTID = Y2.HOSTID AND Y2.HOSTNAME = <engineName>) AND
    (
        (X.STARTTIMESTAMP >= <startTimeStampUTC> AND
            X.STARTTIMESTAMP <= <endTimeStampUTC>) OR
        (X.ENDTIMESTAMP >= <startTimeStampUTC> AND
            X.ENDTIMESTAMP <= <endTimeStampUTC>) OR
        (X.STARTTIMESTAMP < <startTimeStampUTC> AND 
            X.ENDTIMESTAMP > <endTimeStampUTC>)
    )

The information in the operations database returned by the above query is written by the ResMonApp and ResTrackApp services, which obtain it using the following methods.

  • Linux - Physical free memory is calculated using PhysicalFree = MemFree + Cached, where MemFree and Cached are fields from /proc/meminfo. This represents the physical memory that could be made available without requiring anything to be swapped out to disk (the contents of the cache component would be discarded).
  • AIX - Physical free memory is calculated using PhysicalFree = (MemFree + Cached) * 4k, where MemFree is the real_free value returned by the perfstat_memory_total call, and Cached is the numperm value returned by the same call. This represents the physical memory that could be made available without requiring anything to be swapped out to disk (the contents of the cache component would be discarded).
  • Solaris - Physical free memory is calculated using the value of the freemem entry in the kstat control structure module vminfo returned by the kstat_*() system calls.
  • Windows - Physical free memory is calculated using PhysicalFree = (MemFree / 1024), where MemFree is the value returned by the Windows PerformanceCounter class with a counter class of Memory, and counter name of Available Bytes.

Memory - virtual free chart

The free virtual memory chart is displayed on the home page, and can also be optionally displayed on the job run details window and the activity page. For example, from the activity page (virtual memory is the green solid line) as shown in Figure 4.

Figure 4. Free virtual memory chart
Chart showing free virtual memory

The base information for free virtual memory is read from the operations database using the SQL query shown in Listing 4.

Listing 4. SQL example: Free virtual memory
SELECT
    X.MEMFREEKBVIRTUALAVG, X.STARTTIMESTAMP, X.ENDTIMESTAMP
FROM
    DSODB.RESOURCEUSAGESYSTEM X,
    DSODB.HOST Y1,
    DSODB.HOST Y2
WHERE
    (X.HOSTID = Y1.HOSTID AND Y1.HOSTNAME = <remoteNodeName>) AND
    (X.HEAD_HOSTID = Y2.HOSTID AND Y2.HOSTNAME = <engineName>) AND
    (
        (X.STARTTIMESTAMP >= <startTimeStampUTC> AND
            X.STARTTIMESTAMP <= <endTimeStampUTC>) OR
        (X.ENDTIMESTAMP >= <startTimeStampUTC> AND
            X.ENDTIMESTAMP <= <endTimeStampUTC>) OR
        (X.STARTTIMESTAMP < <startTimeStampUTC> AND 
            X.ENDTIMESTAMP > <endTimeStampUTC>)
    )

The information in the operations database returned by the above query is written by the ResMonApp and ResTrackApp services, which obtain it using the following methods.

  • Linux - Virtual free memory is read from the SwapFree field in /proc/meminfo.
  • AIX - Virtual free memory is calculated using VirtualFree = PagesFree * 4k, where PagesFree is the pgsp_free value returned by the perfstat_memory_total call.
  • Solaris - Virtual free memory is calculated using the value of the swap_free entry in the kstat control structure module vminfo returned by the kstat_*() system calls.
  • Windows - Virtual free memory is calculated using VirtualFree = ((CommitLimit - Committed) / 1024), where CommitLimit is the value returned by the Windows PerformanceCounter class with a counter class of Memory, and counter name of Commit Limit. Committed is the value returned by the Windows PerformanceCounter class with a counter class of Memory and counter name of Committed Bytes.

Processes - count chart

The process count chart can optionally be displayed on the job run details window and the activity page. For example, from the activity page, as shown in Figure 5.

Figure 5. Free virtual memory chart
Chart showing process count

The base information for the number of processes (across the entire operating system - not just DataStage related) is read from the operations database using the SQL query shown in Listing 5.

Listing 5. SQL example: Processes count
SELECT
    X.PROCNUMWAITINGAVG , X.PROCNUMSLEEPINGAVG , X.PROCNUMZOMBIEDAVG , 
    X.PROCNUMBLOCKEDAVG , X.PROCNUMRUNNINGAVG , X.STARTTIMESTAMP, 
    X.ENDTIMESTAMP 
FROM
    DSODB.RESOURCEUSAGESYSTEM X,
    DSODB.HOST Y1,
    DSODB.HOST Y2
WHERE
    (X.HOSTID = Y1.HOSTID AND Y1.HOSTNAME = <remoteNodeName>) AND
    (X.HEAD_HOSTID = Y2.HOSTID AND Y2.HOSTNAME = <engineName>) AND
    (
        (X.STARTTIMESTAMP >= <startTimeStampUTC> AND
            X.STARTTIMESTAMP <= <endTimeStampUTC>) OR
        (X.ENDTIMESTAMP >= <startTimeStampUTC> AND
            X.ENDTIMESTAMP <= <endTimeStampUTC>) OR
        (X.STARTTIMESTAMP < <startTimeStampUTC> AND 
            X.ENDTIMESTAMP > <endTimeStampUTC>)
    )

The total process count is then calculated by summing the fields together. The information in the operations database returned by the above query is written by the ResMonApp and ResTrackApp services, which obtain it using the following methods.

  • Linux - Process counts are computed by looping through all the process ID directories under /proc. Within each process ID directory, the stat file is processed to find the state of the process. The total process count is calculated using ProcessCount = ProcRunning + ProcSleeping + ProcZombied + ProcBlocked, where ProcRunning is the sum of stat files with a state of "R", ProcSleeping is the sum of stat files with a state of "S" or "D", ProcZombied is the sum of stat files with a state of "Z", and ProcBlocked is the sum of stat files with a state of "T" or "X".
  • AIX - Process counts are computed by looping through all the process ID directories under /proc. Within each process ID directory the psinfo file is processed to find the state of the process. The total process count is calculated using ProcessCount = ProcRunning + ProcSleeping + ProcZombied + ProcBlocked, where ProcRunning is the sum of psinfo files with a state of "R" or "O", ProcSleeping is the sum of psinfo files with a state of "S", ProcZombied is the sum of psinfo files with a state of "Z", and ProcBlocked is the sum of psinfo files with a state of "T".
  • Solaris - Root access is required to read /proc, so instead process counts are computed by issuing the command /usr/bin/ps -eo s and parsing the result. The total process count is calculated using ProcessCount = ProcRunning + ProcSleeping + ProcZombied + ProcBlocked, where ProcRunning is the sum of processes with a state of "R" or "O", ProcSleeping is the sum of processes with a state of "S", ProcZombied is the sum of processes with a state of "Z", and ProcBlocked is the sum of processes with a state of "T".
  • Windows - The process count is computed using the value returned by the Windows PerformanceCounter class with a counter class of System, and counter name of Processes.

File system - free space chart

The file system free space chart can optionally be displayed on the job run details window and the activity page. For example, from the activity page as shown in Figure 6.

Figure 6. Free disk space
Chart showing free disk space

Free disk space information is only recorded if the DSODBConfig.cfg file has been set up with the relevant ResourceLocalFS and ResourceRemoveFS properties. The base information for the free disk space is read from the operations database using the SQL query shown in Listing 6.

Listing 6. SQL example: Free disk space count
SELECT
    X.DISKPATHMONITORED, X.DISKTOTALKB, X.DISKFREEKBAVG, X.STARTTIMESTAMP, X.ENDTIMESTAMP
FROM
    DSODB.RESOURCEUSAGEDISKS X,
    DSODB.HOST Y1,
    DSODB.HOST Y2
WHERE
    (X.HOSTID = Y1.HOSTID AND Y1.HOSTNAME = <remoteNodeName>) AND
    (X.HEAD_HOSTID = Y2.HOSTID AND Y2.HOSTNAME = <engineName>) AND
    (
        (X.STARTTIMESTAMP >= <startTimeStampUTC> AND
            X.STARTTIMESTAMP <= <endTimeStampUTC>) OR
        (X.ENDTIMESTAMP >= <startTimeStampUTC> AND
            X.ENDTIMESTAMP <= <endTimeStampUTC>) OR
        (X.STARTTIMESTAMP < <startTimeStampUTC> AND 
            X.ENDTIMESTAMP > <endTimeStampUTC>)
    )

The information in the operations database returned by the previous query is written by the ResMonApp and ResTrackApp services, which obtain it using the following methods.

  • Linux, AIX and Solaris - Free disk space information is computed using the disk block size and available block count returned by the statvfs64() system call. Specifically the block size is read from the f_frsize element and available block count is read from the f_bavail element.
  • Windows - Free disk space information is computed using the disk block size and available block count returned by the GetDiskFreeSpace() system call. Specifically the block size is read from the lpBytesPerSector return argument and available block count is calculated by multiplying the lpNumberOfFreeClusters return argument by the lpSectorsPerCluster return argument.

General properties

The general properties on the home page show the latest recorded information about the currently selected engine system, as shown in Table 2.

Table 2. Time zones
General properties
Engine started:Oct 24, 2011 15:46:05 (local time zone). Engine is UTC+0100 (BST), local is UTC+0100
Operating System:Linux: 2.6.32.45-0.3-pae
CPU:2x Intel(R) Pentium(R)4 CPU 3.20GHz
Memory:2.8GB Physical - 2.01GB Virtual

The engine start time is read by calling the engine system through ASBAgent and using a UniVerse Basic system call in DSEngine itself, for example:

engineStartTime = SYSTEM(12001)

The local UTC offset is read by running the following JavaScript on the local web browser, for example:

offset = new Date().getTimezoneOffset();

The remainder of this information is read from the operations database using the following SQL query. The sub-select is required because the operations database records all historical engine system information, but only requires the latest, hence the MAX(LASTCHECKEDTIMESTAMP), as shown in Listing 7.

Listing 7. SQL Example: Engine properties
SELECT
    HD.CREATEDTIMESTAMP, HD.LASTCHECKEDTIMESTAMP, HD.CPUMODEL, HD.NUMCPUS,
    HD.PHYSICALMEMORYKB, HD.PLATFORMNAME, HD.PLATFORMVERSION, 
    HD.UTCOFFSETMINS, HD.VIRTUALMEMORYKB, HD.TIMEZONENAME 
FROM 
    DSODB.HOSTDETAIL HD, 
    DSODB.HOST H1, 
    DSODB.HOST H2, 
    (
        SELECT 
            MAX(LASTCHECKEDTIMESTAMP) AS MAXTIMESTAMP  
        FROM 
            DSODB.HOSTDETAIL HD, DSODB.HOST H1, DSODB.HOST H2
        WHERE 
            HD.HOSTID = H1.HOSTID AND 
            H1.HOSTNAME = <engineName> AND
            HD.HEAD_HOSTID = H2.HOSTID AND 
            H2.HOSTNAME = <engineName>
    ) M 
WHERE 
    HD.LASTCHECKEDTIMESTAMP = M.MAXTIMESTAMP AND 
    HD.HOSTID = H1.HOSTID AND 
    H1.HOSTNAME = <engineName> AND 
    HD.HEAD_HOSTID = H2.HOSTID AND 
    H2.HOSTNAME = <engineName>

The information in the operations database returned by the previous query is written by the ResMonApp and ResTrackApp services, which obtain it using the following methods.

Linux

  • CPUMODEL and NUMCPUS ultimately come from the model name attribute in /proc/cpuinfo.
  • PHYSICALMEMORYKB is read from the MemTotal field in /proc/meminfo.
  • VIRTUALMEMORYKB is read from the SwapTotal field in /proc/meminfo.
  • PLATFORMNAME and PLATFORMVERSION are read from /proc/version.
  • TIMEZONENAME and UTCOFFSETMINS are read from the tm_zone and tm_gmtoff respectively returned by the localtime() system call.

AIX

  • CPUMODEL and NUMCPUS ultimately come from the processor information returned by the perfstat_cpu_total () system call.
  • PHYSICALMEMORYKB is calculated by multiplying the real_total value returned by the perfstat_memory_total() system call by the page size of 4.
  • VIRTUALMEMORYKB is calculated by multiplying the pgsp_total value returned by the perfstat_memory_total() system call by the page size of 4.
  • PLATFORMNAME and PLATFORMVERSION are read using the sysname and version values respectively returned by the uname() system call.
  • TIMEZONENAME is read from the "TZ" environment variable.
  • UTCOFFSETMINS is calculated using the tz_minuteswest value returned by the gettimeofday() system call.

Solaris

  • CPUMODEL is read by parsing the information returned by the /usr/sbin/psrinfo -v command.
  • NUMCPUS is read from the cpu_count value returned by the sysconf(_SC_NPROCESSORS_CONF) system call.
  • PHYSICALMEMORYKB is calculated by multiplying the Memory value returned by the /usr/sbin/prtconf command by 1024.
  • VIRTUALMEMORYKB is calculated by multiplying the ani_max value returned by the swapctl() system call by the system page size.
  • PLATFORMNAME and PLATFORMVERSION are read using the values returned by the sysinfo(SI_SYSNAME,...) and sysinfo(SI_RELEASE,...) system calls.
  • TIMEZONENAME is read from the "TZ" environment variable.
  • UTCOFFSETMINS is calculated by computing the difference of the times returned by the localtime() and gmtime() system calls.

Windows

  • CPUMODEL is read from the HKEY_LOCAL_MACHINE\\HARDWARE\\DESCRIPTION\\System\CentralProcessor\\0\ \ProcessorNameString registry key.
  • NUMCPUS is read from the value of the dwNumberOfProcessors element returned by the GetSystemInfo() system call.
  • PHYSICALMEMORYKB is calculated by dividing the ullTotalPhys value returned by the GlobalMemoryStatusEx() system call by 1024.
  • VIRTUALMEMORYKB is calculated by dividing the value returned by the Windows PerformanceCounter class with a counter class of Memory and counter name of Commit Limit by 1024.
  • PLATFORMNAME is always set to Windows.
  • PLATFORMVERSION is calculated by looking at the values for dwMajorVersion, dwMinorVersion, and wProductType returned by the GetVersionEx() system call.
  • TIMEZONENAME is read from the value returned by the _get_tzname() system call.
  • UTCOFFSETMINS is calculated by using the value returned by the _get_timezone() system call and adding the value returned by the _get_dstbias() system call.

DataStage engine monitoring

The Operations Console displays information about various DataStage engine metrics such as job run counts and service status. The attributes associated with each of these metrics, and where the information is read from, is explained as follows.

Job runs chart

The job runs chart is displayed on the home page, and can also be optionally displayed the activity page as well. For example, from the home page, as shown in Figure 7.

Figure 7. Job runs chart
Chart showing job runs

It shows the number of job runs that were active over a given period of time for the currently selected engine system. This is defined as any job run that satisfies one of the following criteria.

  • Started in the time period
  • Ended in the time period
  • Running throughout the time period

The base information is read from the operations database using the SQL query (to keep the SQL query simple, project filtering is not included here) as shown in Listing 8.

Listing 8. SQL example: Job runs activity
SELECT 
    X.RUNMAJORSTATUS, X.RUNENDTIMESTAMP, X.RUNSTARTTIMESTAMP 
FROM 
    DSODB.JOBRUN X, DSODB.JOBEXEC Y, DSODB.HOST Z 
WHERE 
    X.JOBID = Y.JOBID AND 
    Y.HOSTID = Z.HOSTID AND 
    Z.HOSTNAME = <engineName> AND 
    (
        (X.CREATIONTIMESTAMP >= <startTimeStampUTC> AND 
            X.CREATIONTIMESTAMP <= <endTimeStampUTC>) OR 
        (X.RUNENDTIMESTAMP >= <startTimeStampUTC> AND 
            X.RUNENDTIMESTAMP <= <endTimeStampUTC>) OR 
        (X.RUNENDTIMESTAMP IS NULL AND 
            X.CREATIONTIMESTAMP <= <endTimeStampUTC>)
    )

This information is then aggregated by date into data that can be plotted on a chart.

Finished job runs chart

The finished job runs chart is displayed on the home page, and shows the number of job runs in a finished state over a period of time for the currently selected engine system, as shown in Figure 8.

Figure 8. Finished job runs chart
Chart showing finished job runs

For the purposes of this visualization, a finished state is defined as any job run that has its major status set to FIN (finished).

The base information is read from the operations database using the SQL query (to keep the SQL query simple, project filtering is not included here) shown in Listing 9.

Listing 9. SQL example: Finished job runs
SELECT
    X.RUNMAJORSTATUS, X.RUNMINORSTATUS 
FROM 
    DSODB.JOBRUN X, 
    DSODB.JOBEXEC Y, 
    DSODB.HOST Z
WHERE 
    X.JOBID = Y.JOBID AND 
    Y.HOSTID = Z.HOSTID AND 
    Z.HOSTNAME = <engineName> AND 
    X.RUNMAJORSTATUS IN ('FIN') AND 	
    (
        (X.CREATIONTIMESTAMP >= <startTimeStampUTC> AND 
            X.CREATIONTIMESTAMP <= <endTimeStampUTC>) OR 
        (X.RUNENDTIMESTAMP >= <startTimeStampUTC> AND 
            X.RUNENDTIMESTAMP <= <endTimeStampUTC>) OR 
        (X.RUNENDTIMESTAMP IS NULL AND 
            X.CREATIONTIMESTAMP <= <endTimeStampUTC>)
    )

The information returned is then aggregated by the run minor status to group the job run counts into the following categories which are displayed in the Operations Console.

  • Failed – run minor status is one of: FWF (finished with fatal), CRA (crashed), STP (stopped by user), SYN (marked as finished because process running the job unexpectedly disappeared).
  • Finished with warnings – run minor status is FWW (finished with warnings).
  • Finished – run minor status is FOK (finished OK).
  • Any other run minor status is ignored.

Each of these categories can be clicked by the user to show the list of job runs that match, using the SQL queries shown in Listing 10, 11, and 12.

Listing 10. SQL example: Finished job runs - failed
SELECT 
    X.TOTALPHYSICALNODES, X.TOTALLOGICALNODES, X.RUNID, X.JOBID, X.CONDUCTORPID, 
        X.CONFIGFILENAME, X.CUSTOMBATCHID, X.CUSTOMCONTACT, X.CUSTOMFIELD1, 
        X.CUSTOMFIELD2, X.CUSTOMJOBTYPE, X.CUSTOMSEQUENCE, X.DSUSERNAME, 
        X.ELAPSEDRUNSECS, X.ISUSERNAME, X.LASTUPDATETIMESTAMP, X.RUNMAJORSTATUS, 
        X.MASTERPID, X.RUNMINORSTATUS, X.NUMMESSAGESFATAL, X.NUMMESSAGESTOTAL, 
        X.NUMMESSAGESWARNING, X.RUNENDTIMESTAMP, X.RUNSTARTTIMESTAMP, X.RUNTYPE, 
        X.TOTALROWSCONSUMED, X.TOTALROWSPRODUCED, X.USERSTATUS, X.INVOCATIONID, 
        X.CREATIONTIMESTAMP, X.CONTROLLING_RUNID, X.TOTALCPU, 
    Y.PROJECTNAME, Y.JOBNAME, Y.COMPILATIONTIMESTAMP, Y.JOBTYPE, Y.FOLDERPATH, 
        Y.WEBSERVICESENABLED 
FROM 
    DSODB.JOBRUN X, 
    DSODB.JOBEXEC Y, 
    DSODB.HOST Z  
WHERE 
    X.JOBID = Y.JOBID AND 
    Y.HOSTID = Z.HOSTID AND 
    Z.HOSTNAME = <engineName> AND 
    X.RUNMAJORSTATUS IN ('FIN') AND 
    X.RUNMINORSTATUS IN ('FWF','STP','CRA','SYN') AND 
    (
        (X.CREATIONTIMESTAMP >= <startTimeStampUTC> AND 
            X.CREATIONTIMESTAMP <= <endTimeStampUTC>) OR
        (X.RUNENDTIMESTAMP >= <startTimeStampUTC> AND 
            X.RUNENDTIMESTAMP <= <endTimeStampUTC>) OR 
        (X.RUNENDTIMESTAMP IS NULL AND 
            X.CREATIONTIMESTAMP <= <endTimeStampUTC>) OR
        (X.CREATIONTIMESTAMP < <startTimeStampUTC> AND 
            X.RUNENDTIMESTAMP > <endTimeStampUTC>)
    )
Listing 11. SQL example: Finished job runs - with warnings
SELECT 
    X.TOTALPHYSICALNODES, X.TOTALLOGICALNODES, X.RUNID, X.JOBID, X.CONDUCTORPID, 
        X.CONFIGFILENAME, X.CUSTOMBATCHID, X.CUSTOMCONTACT, X.CUSTOMFIELD1, 
        X.CUSTOMFIELD2, X.CUSTOMJOBTYPE, X.CUSTOMSEQUENCE, X.DSUSERNAME, 
        X.ELAPSEDRUNSECS, X.ISUSERNAME, X.LASTUPDATETIMESTAMP, X.RUNMAJORSTATUS, 
        X.MASTERPID, X.RUNMINORSTATUS, X.NUMMESSAGESFATAL, X.NUMMESSAGESTOTAL, 
        X.NUMMESSAGESWARNING, X.RUNENDTIMESTAMP, X.RUNSTARTTIMESTAMP, X.RUNTYPE, 
        X.TOTALROWSCONSUMED, X.TOTALROWSPRODUCED, X.USERSTATUS, X.INVOCATIONID, 
        X.CREATIONTIMESTAMP, X.CONTROLLING_RUNID, X.TOTALCPU, 
    Y.PROJECTNAME, Y.JOBNAME, Y.COMPILATIONTIMESTAMP, Y.JOBTYPE, Y.FOLDERPATH, 
        Y.WEBSERVICESENABLED 
FROM 
    DSODB.JOBRUN X, 
    DSODB.JOBEXEC Y, 
    DSODB.HOST Z  
WHERE 
    X.JOBID = Y.JOBID AND 
    Y.HOSTID = Z.HOSTID AND 
    Z.HOSTNAME = <engineName> AND 
    X.RUNMAJORSTATUS IN ('FIN') AND 
    X.RUNMINORSTATUS IN ('FWW') AND 
    (
        (X.CREATIONTIMESTAMP >= <startTimeStampUTC> AND 
            X.CREATIONTIMESTAMP <= <endTimeStampUTC>) OR
        (X.RUNENDTIMESTAMP >= <startTimeStampUTC> AND 
            X.RUNENDTIMESTAMP <= <endTimeStampUTC>) OR 
        (X.RUNENDTIMESTAMP IS NULL AND 
            X.CREATIONTIMESTAMP <= <endTimeStampUTC>) OR
        (X.CREATIONTIMESTAMP < <startTimeStampUTC> AND 
            X.RUNENDTIMESTAMP > <endTimeStampUTC>)
    )
Listing 12. SQL example: Finished job runs - OK
SELECT 
    X.TOTALPHYSICALNODES, X.TOTALLOGICALNODES, X.RUNID, X.JOBID, X.CONDUCTORPID, 
        X.CONFIGFILENAME, X.CUSTOMBATCHID, X.CUSTOMCONTACT, X.CUSTOMFIELD1, 
        X.CUSTOMFIELD2, X.CUSTOMJOBTYPE, X.CUSTOMSEQUENCE, X.DSUSERNAME, 
        X.ELAPSEDRUNSECS, X.ISUSERNAME, X.LASTUPDATETIMESTAMP, X.RUNMAJORSTATUS, 
        X.MASTERPID, X.RUNMINORSTATUS, X.NUMMESSAGESFATAL, X.NUMMESSAGESTOTAL, 
        X.NUMMESSAGESWARNING, X.RUNENDTIMESTAMP, X.RUNSTARTTIMESTAMP, X.RUNTYPE, 
        X.TOTALROWSCONSUMED, X.TOTALROWSPRODUCED, X.USERSTATUS, X.INVOCATIONID, 
        X.CREATIONTIMESTAMP, X.CONTROLLING_RUNID, X.TOTALCPU, 
    Y.PROJECTNAME, Y.JOBNAME, Y.COMPILATIONTIMESTAMP, Y.JOBTYPE, Y.FOLDERPATH, 
        Y.WEBSERVICESENABLED 
FROM 
    DSODB.JOBRUN X, 
    DSODB.JOBEXEC Y, 
    DSODB.HOST Z  
WHERE 
    X.JOBID = Y.JOBID AND 
    Y.HOSTID = Z.HOSTID AND 
    Z.HOSTNAME = <engineName> AND 
    X.RUNMAJORSTATUS IN ('FIN') AND 
    X.RUNMINORSTATUS IN ('FOK') AND 
    (
        (X.CREATIONTIMESTAMP >= <startTimeStampUTC> AND 
            X.CREATIONTIMESTAMP <= <endTimeStampUTC>) OR
        (X.RUNENDTIMESTAMP >= <startTimeStampUTC> AND 
            X.RUNENDTIMESTAMP <= <endTimeStampUTC>) OR 
        (X.RUNENDTIMESTAMP IS NULL AND 
            X.CREATIONTIMESTAMP <= <endTimeStampUTC>) OR
        (X.CREATIONTIMESTAMP < <startTimeStampUTC> AND 
            X.RUNENDTIMESTAMP > <endTimeStampUTC>)
    )

Service state

The service state is shown on the home page and displays the results of checks to ensure each listed service is running. As shown in Figure 9, where applicable it also checks that the service is functioning as expected for the currently selected engine system.

Figure 9. Service status
Chart showing service status

For each service the following check or checks are made.

  • AppWatcher
    • The DSODBON setting in InformationServer/Server/DSODB/DSODBConfig.cfg must be set to 1. If not the service is marked as Disabled.
    • There must be a process called DSAppWatcherStart.sh running on the engine which is running the appropriate instance (identified by a command line parameter). If this is not present, then it shows as Not Running.
  • ASB Agent
    • If a remote procedure call can be successfully made from WebSphere to the ASBAgent running on the engine system, then this service is marked as running OK.
    • If the remote procedure call fails because ASBAgent is not listening on the configured socket (31531 by default, configured in InformationServer/ASBNode/conf/agent.properties), or it has not been correctly configured, then this service is marked as Not Responding, and all other services are marked as Engine unreachable (since ASBAgent is the gateway between WebSphere and the engine system).
  • DataStage RPC daemon (dsrpcd)
    • There must be a process called dsrpcd running on the engine. If this is not present, then Not Running is shown.
    • There must be a valid engine credential available (configured via the InfoSphere Information Server administration web console). If not, then No DataStage credentials mapped is shown.
    • A connection is made to the dsrpcd service via ds4j. If this connection is successful the service is marked as OK. If the connection fails, then the service is either marked as Invalid username or password or Not responding, as appropriate.
  • Engine Monitoring Application (engmonapp)
    • The DSODBON setting in InformationServer/Server/DSODB/DSODBConfig.cfg must be set to 1. If not the service is marked as Disabled.
    • There must be a process called java running on the engine which is running the appropriate engmonapp JAR (identified by the java command line parameter -DPSMARKER). If this is not present Not Running is shown.
  • Job Monitoring Application (jobmonapp)
    • There must be a process called JobMonApp running on the engine. If this is not present Not Running is shown.
    • A connection is made to the sockets that jobmonapp is configured to listen on (13400 and 13401 by default, configured in InformationServer/Server/PXEngine/etc/jobmon_ports). If the connection fails the service is marked as Not responding.
  • Logging Agent
    • For Windows: There must be a process called LoggingAgent running on the engine. If this is not present Not Running is shown.
    • For UNIX: A file called InformationServer/ASBNode/bin/LoggingAgent.pid is read to find the PI of the logging agent process. If this fails or the PID corresponds to a process that no longer exists, then Not Running is returned.
    • A connection is made to the socket the logging agent is configured to listen on (31533 by default). If the connection fails, then the service is marked as Not responding.
  • ODB Query Application (odbqueryapp)
    • The DSODBON setting in InformationServer/Server/DSODB/DSODBConfig.cfg must be set to 1. If not the service is marked as Disabled.
    • There must be a process called java running on the engine which is running the appropriate odbqueryapp JAR (identified by the java command line parameter -DPSMARKER). If this is not present, then Not Running is shown.
    • A connection is made to the socket odbqueryapp is configured to listen on (13451 by default, configured by the ODBQAPPPORTNUM property in InformationServer/Server/DSODB/DSODBConfig.cfg). If the connection fails, then the service is marked as Not responding.
  • Resource Monitoring Application (resmonapp)
    • The DSODBON setting in InformationServer/Server/DSODB/DSODBConfig.cfg must be set to 1. If not, then the service is marked as Disabled.
    • There must be a process called java running on the engine which is running the appropriate resmonapp JAR (identified by the java command line parameter -DPSMARKER). If this is not present, then Not Running is shown.

Object properties

The Operations Console displays information about projects, folders, job designs, and job runs. The set of properties associated with each of these objects, and where the information is read from, is explained as follows.

Projects

Project properties are shown in the projects page, for example, as shown in Figure 10.

Figure 10. Project properties
Project properties

The following properties are displayed.

  • Name (read from the metadata repository) shows the name of the project set when it was created.
  • Created (read from the metadata repository):
    • The user name displayed for a project is always DataStage System User, regardless of the user that actually created the project.
    • The timestamp shown is the timestamp the project was created as set by the clock on the services tier system.
  • Protected (read from DSEngine) shows whether the project has been protected or not via the DataStage Administrator client.
  • Path (read from DSEngine) shows the project path, which is typically where the project's DSEngine files are stored.
  • Number of folders (read from the metadata repository) shows the number of folders (including nested folders) in the entire project, whether empty or not.
  • Number of jobs (read from the metadata repository) shows the number of jobs in the entire project.
  • Environment variables (read from DSEngine). This shows the same information as seen in the DataStage Administrator client, such as the current name, prompt, and default value, as read from the DSParams file in the project path.

Folders (all read from the metadata repository)

Folder properties are shown in the projects page, for example, as shown in Figure 11.

Figure 11. Folder properties
Folder properties

The properties displayed are as follows.

  • Name shows the current name of the folder.
  • Created:
    • The user name displayed for folders created when the project was created (such as \Jobs, \Routines, and so on.) is always DataStage System User, regardless of the user that actually created the project. For other folders, the user name displayed is made by concatenating the first name and last name fields of the information from the user registry.
    • Where allowed, clicking the user name shows more information such as telephone numbers and email address.
    • The timestamp shown is the timestamp the folder was created as set by the clock on the services tier system.
  • Number of sub-folders shows the total number of sub-folders (including nested folders) under the selected folder, whether empty or not.
  • Number of jobs shows the number of jobs in the selected folder (does not include jobs in nested sub-folders).

Job designs (all read from the metadata repository unless otherwise stated)

Job design properties are shown in the projects page, for example, as shown in Figure 12.

Figure 12. Job design properties
Job design properties

The properties displayed are as follows.

  • Name shows the current name of the job design.
  • Short description shows the current short description of the job design.
  • Created and last modified:
    • The user name displayed is made by concatenating the first name and last name fields of the information from the user registry.
    • Where allowed, clicking the user name shows more information such as telephone numbers and email address.
    • The timestamp shown is the timestamp the job design was created as set by the clock on the services tier system.
  • Last compiled (read from DSEngine) shows the timestamp when the job executable was created, indicating when the job design was last compiled (if at all). Set from the clock on the engine tier system.
  • Job type can be one of Server, Parallel or Sequence. Mainframe job designs are not shown in the Operations Console.
  • Multi-instance shows whether or not the job design has been multi-instance enabled via the job properties window in the DataStage Designer.
  • Information services enabled shows whether or not the job design has been enabled for information services (previously called WISD) via the job properties windows in the DataStage Designer.
  • Used by job(s):
    • Lists the number of other job designs (typically job sequences) that directly use the job design.
    • If non-zero, clicking the number shows the full list of job designs. This is equivalent to the Where Used operation in the DataStage Designer.
  • Depends on job(s):
    • Lists the number of other job designs that are used directly by the job design.
    • If non-zero, clicking the number shows the full list of job designs. This is equivalent to the Depends On operation in the DataStage Designer.

Job runs (read from operations database)

Job run properties are shown in various places in the application, for example the job run details dialog as shown in Figure 13.

Figure 13. Job run properties
Job run properties

The properties displayed are as follows.

  • Name is read from the JOBEXEC.JOBNAME column and shows the name of the job when the job run occurred.
  • Project is read from the JOBEXEC.PROJECTNAME column and shows the name of the containing project when the job run occurred.
  • Executable compiled on is read from the JOBEXEC.COMPILATIONTIMESTAMP column and shows the timestamp when the job executable used for the job run was created, indicating when the job design was compiled. Set from the clock on the engine tier system.
  • Job type is read from the JOBEXEC.JOBTYPE column and can be one of Server (=SRV), Parallel (=PAR) or Sequence (=SEQ). Mainframe job designs are not shown in the Operations Console.
  • Run type is read from the JOBRUN.RUNTYPE column and can be one of Run (=RUN), Validate (=VAL), or Reset (=RES), corresponding to the operation performed in the DataStage Director or dsjob etc. Job resets and validations are both counted as job runs for the purposes of the Operations Console.
  • Started by job run is derived from the JOBRUN.CONTROLLING_RUNID column which is used as a parameter for another SQL query to read the information for the controlling job run (if any).
  • Started job runs is derived from the JOBRUN.RUNID column which is used as a parameter for another SQL query to read the information for controlled job runs (if any).
  • Configuration file is read from the JOBRUN.CONFIGFILENAME column and shows the configuration file used when the job run occurred. Where allowed, clicking the configuration file shows the contents of the configuration file at the time the job run occurred (ignoring any subsequent changes).
  • Conductor PID is read from the JOBRUN.CONDUCTORPID column and shows the PID (process identifier) of the conductor process when the job was run. This is only applicable to parallel jobs.
  • Master PID is read from the JOBRUN.MASTERPID column and shows the PID (process identifier) of the master job run process, which is the root of the process tree for the job run.
  • Started:
    • The InfoSphere Information Server user ID is read from the JOBRUN.ISUSERNAME column, and the Operating System user ID is read from the JOBRUN.DSUSERNAME column. The InfoSphere Information Server user ID is empty if the job was run as a scheduled job or from dsjob using the local option.
    • The user name displayed is made by concatenating the first name and last name fields of the information from the user registry.
    • Where allowed, clicking the user name shows more of the information such as telephone numbers and email address.
    • The timestamp is read from JOBRUN.RUNSTARTTIMESTAMP and shows the timestamp when the job run started as set by the clock on the engine tier system.
  • Invocation ID is read from the JOBRUN.INVOCATIONID column and shows any invocation ID used when the job was run. Note that if no invocation ID was set, the data stored in the operations database is set to "-", which the Operations Console maps to "", which means it is not set.
  • Status is derived from the JOBRUN.RUNMINORSTATUS column and shows the job run status according to the following value of the run minor status:
    • QUE – status is queued
    • STR – status is starting
    • RUN, RNW or RNF – status is running
    • RNS – status is stopping
    • STP - status is stopped. Error icon and associated run end date might also be shown.
    • FOK – status is finished. Success icon and associated run end date might also be shown.
    • FWW – status is finished. Warning icon and associated run end date might also be shown.
    • FWF, CRA, or SYN – failed. Error icon and associated run end date might also be shown.
    • Any other run minor status - unknown. Warning icon shown might also be shown.
    • Where applicable any associated end timestamp is read from JOBRUN.RUNENDTIMESTAMP and shows the timestamp when the job run finished as set by the clock on the engine tier system.
  • Elapsed shows the difference between the job run end timestamp (JOBRUN.RUNENDTIMESTAMP) and the job run start timestamp (JOBRUN.RUNSTARTTIMESTAMP), or if the job run has not yet finished, the difference between the job run start timestamp (JOBRUN.RUNENDTIMESTAMP) and the current time according to the clock on the engine tier system. The format of this property is HH:MM:SS. Note the JOBRUN.ELAPSEDRUNSECS column is not used as it is only periodically updating when a job is running.
  • CPU time is read from the JOBRUN.TOTALCPU column and shows the amount of processing time the job run has consumed so far. This is the total across all processing units on which the job run was using. For example on a system with 4 hyper-threaded CPUs, there are 8 processing units available, and if a job run lasted 1 hour, totally consuming all 8 processing units for that hour, the CPU time is 8 hours. The format of this property is HH:MM:SS.
  • Rows in is read from the JOBRUN.TOTALROWSCONSUMED column and shows the total number of rows so far read from the outside world by all source stages in the job run. Specifically, this is based on summing the rows for all links that read from a source stage (including row generator stages and lookups). For Parallel jobs these are links from stages that only have one link, leading out of the stage. For server jobs it is those links that connect a passive stage to an active stage.
  • Rows in per second is derived by dividing the rows in value by the elapsed time, as calculated previously.
  • Rows out is read from the JOBRUN.TOTALROWSPRODUCED column and shows the total number of rows so far written to the outside world by all target stages in the job run. Specifically, this is based on summing the rows for all links that write to a target stage (including peek stages). For Parallel jobs these are links to stages that only have one link, leading into the stage. For server jobs it is those links that connect an active stage to a passive stage.
  • Rows out per second is derived by dividing the rows out value by the elapsed time, as calculated above.
  • Messages is read from the JOBRUN.NUMMESSAGESTOTAL column and shows the total number of log messages produced by the job run so far. The bracketed figures show the breakdown of warning (JOBRUN.NUMMESSAGESWARNING) and fatal messages (JOBRUN.NUMMESSAGESFATAL). Note that the count of messages shown does not take account of any filtering set in the DSODBConfig.cfg file, meaning that even though it might say 20 messages have been produced, only two may have been captured for the operations database.
  • Parameters shows the list of all job parameters associated with the job run.
    • Name is read from the JOBRUNPARAMSVIEW.PARAMNAME column and shows the name of the job parameters.
    • Value is read from the JOBRUNPARAMSVIEW.PARAMVALUE column and shows the value of the job parameters. Encrypted job parameters values are stored in the operations database as "********", and this is what is displayed in the Operations Console.
  • Logs shows the list of all captured log messages (according to the filter settings in DSODBConfig.cfg).
    • Elapsed shows the difference between the log message timestamp (JOBRUNLOG.LOGTIMESTAMP) and the start timestamp of the job run (JOBRUN.RUNSTARTTIMESTAMP). The format of this property is HH:MM:SS.
    • Message is read from the JOBRUNPARAMSVIEW.MESSAGETEXT column and shows the full text of the log message.

User interface elements

Parts of the user interface of the Operations Console that do not fall into any of the previous categories are as follows.

Projects page – folders

When a folder is selected the job designs panel shows thumbnail images of every job contained in the selected folder, as shown in Figure 14.

Figure 14. Projects page - folders
This figure shows the Projects Page - Folders

These thumbnail images are generated on-demand and do not use any cached version, thus any change in a job design is reflected in one of these thumbnail images when the page is refreshed or the folder is next selected.

Projects page – job designs

When a job design is selected, as well as its properties, details of its latest activity and historical job runs are displayed, as shown in Figure 15.

Figure 15. Projects page - job designs
This figure shows the Projects Page - Job Designs

The latest activity section shows the details for the job run with the most recent start timestamp, if any (JOBRUN.RUNSTARTTIMESTAMP). Note that even if a job run is still running, if it has an older start timestamp than one that has finished, the job run that is finished is displayed. The following SQL query is used to retrieve the information for the most recently started job run, as shown in Listing 13.

Listing 13. SQL example: Job design latest activity
SELECT 
    X1.TOTALCPU, X1.RUNID, X1.CONDUCTORPID, X1.CONFIGFILENAME, X1.CUSTOMBATCHID, 
        X1.CUSTOMCONTACT, X1.CUSTOMFIELD1, X1.CUSTOMFIELD2, X1.CUSTOMJOBTYPE, 
        X1.CUSTOMSEQUENCE, X1.DSUSERNAME, X1.ELAPSEDRUNSECS, X1.ISUSERNAME, 
        X1.LASTUPDATETIMESTAMP, X1.RUNMAJORSTATUS, X1.MASTERPID, X1.RUNMINORSTATUS, 
        X1.NUMMESSAGESFATAL, X1.NUMMESSAGESTOTAL, X1.NUMMESSAGESWARNING, 
        X1.RUNENDTIMESTAMP, X1.RUNSTARTTIMESTAMP, X1.RUNTYPE, X1.TOTALROWSCONSUMED, 
        X1.TOTALROWSPRODUCED, X1.USERSTATUS, X1.INVOCATIONID, X1.CREATIONTIMESTAMP, 
        X1.CONTROLLING_RUNID, 
    Y1.PROJECTNAME, Y1.JOBNAME, Y1.COMPILATIONTIMESTAMP, Y1.JOBTYPE, Y1.FOLDERPATH, 
        Y1.WEBSERVICESENABLED 
FROM 
    DSODB.JOBRUN X1, 
    DSODB.JOBEXEC Y1, 
    DSODB.HOST Z1
WHERE 
    X1.JOBID = Y1.JOBID AND 
    X1.RUNSTARTTIMESTAMP IN 
    (
        SELECT 
            MAX(X.RUNSTARTTIMESTAMP) 
        FROM 
            DSODB.JOBRUN X, DSODB.JOBEXEC Y, DSODB.HOST Z 
        WHERE
            X.JOBID = Y.JOBID AND
            Y.HOSTID = Z.HOSTID AND 
            Z.HOSTNAME = <engineName> AND 
            Y.PROJECTNAME = <projectName> AND 
            Y.JOBNAME = <jobDesignName> AND 
            Y.FOLDERPATH = <containingFolderPath>
    ) AND 
    Y1.HOSTID = Z1.HOSTID AND 
    Z1.HOSTNAME = <engineName> AND 
    Y1.PROJECTNAME = <projectName> AND 
    Y1.JOBNAME = <jobDesignName> AND 
    Y1.FOLDERPATH = <containingFolderPath>

The job runs section shows the associated list of job runs for the selected job design (if any). The SQL query shown in Listing 14 is used to retrieve the list of job runs (to keep the SQL query simple, sorting and pagination is not included here).

Listing 14. SQL example: Job runs for design
SELECT 
    X.TOTALPHYSICALNODES, X.TOTALLOGICALNODES, X.RUNID, X.JOBID, 
        X.CONDUCTORPID, X.CONFIGFILENAME, X.CUSTOMBATCHID, X.CUSTOMCONTACT, 
        X.CUSTOMFIELD1, X.CUSTOMFIELD2, X.CUSTOMJOBTYPE, X.CUSTOMSEQUENCE, 
        X.DSUSERNAME, X.ELAPSEDRUNSECS, X.ISUSERNAME, X.LASTUPDATETIMESTAMP, 
        X.RUNMAJORSTATUS, X.MASTERPID, X.RUNMINORSTATUS, X.NUMMESSAGESFATAL, 
        X.NUMMESSAGESTOTAL, X.NUMMESSAGESWARNING, X.RUNENDTIMESTAMP, X.RUNSTARTTIMESTAMP,
        X.RUNTYPE, X.TOTALROWSCONSUMED, X.TOTALROWSPRODUCED, X.USERSTATUS, 
        X.INVOCATIONID, X.CREATIONTIMESTAMP, X.CONTROLLING_RUNID, X.TOTALCPU, 
    Y.PROJECTNAME, Y.JOBNAME, Y.COMPILATIONTIMESTAMP, Y.JOBTYPE, Y.FOLDERPATH, 
        Y.WEBSERVICESENABLED 
FROM 
    DSODB.JOBRUN X, 
    DSODB.JOBEXEC Y, 
    DSODB.HOST Z
WHERE 
    X.JOBID = Y.JOBID AND 
    Y.HOSTID = Z.HOSTID AND 
    Z.HOSTNAME = <engineName> AND 
    Y.PROJECTNAME IN (<projectName>) AND 
    Y.JOBNAME = <jobDesignName> AND
    Y.FOLDERPATH = <containingFolderPath>

Activity page – job runs

The job runs section shows the list of job runs within the selected time period. In past activity mode, the start and end timestamps are chosen by the user (with respect to the local time zone on the browser). In current activity mode, the end timestamp is set to now as set by the clock on the engine tier system and the start timestamp is set to be the users chosen duration back from the end timestamp.

The SQL query shown in Listing 15 is used to retrieve the list of job runs (to keep the SQL query simple, sorting, pagination and project filtering is not included here).

Listing 15. SQL example: Activity page - job runs
SELECT 
    X.TOTALPHYSICALNODES, X.TOTALLOGICALNODES, X.RUNID, X.JOBID, 
        X.CONDUCTORPID, X.CONFIGFILENAME, X.CUSTOMBATCHID, X.CUSTOMCONTACT, 
        X.CUSTOMFIELD1, X.CUSTOMFIELD2, X.CUSTOMJOBTYPE, X.CUSTOMSEQUENCE, 
        X.DSUSERNAME, X.ELAPSEDRUNSECS, X.ISUSERNAME, X.LASTUPDATETIMESTAMP, 
        X.RUNMAJORSTATUS, X.MASTERPID, X.RUNMINORSTATUS, X.NUMMESSAGESFATAL, 
        X.NUMMESSAGESTOTAL, X.NUMMESSAGESWARNING, X.RUNENDTIMESTAMP, X.RUNSTARTTIMESTAMP,
        X.RUNTYPE, X.TOTALROWSCONSUMED, X.TOTALROWSPRODUCED, X.USERSTATUS, 
        X.INVOCATIONID, X.CREATIONTIMESTAMP, X.CONTROLLING_RUNID, X.TOTALCPU, 
    Y.PROJECTNAME, Y.JOBNAME, Y.COMPILATIONTIMESTAMP, Y.JOBTYPE, Y.FOLDERPATH, 
        Y.WEBSERVICESENABLED 
FROM 
    DSODB.JOBRUN X, 
    DSODB.JOBEXEC Y, 
    DSODB.HOST Z  
WHERE 
    X.JOBID = Y.JOBID AND 
    Y.HOSTID = Z.HOSTID AND 
    Z.HOSTNAME = <engineName> AND
    (
        (X.CREATIONTIMESTAMP >= <startTimeStampUTC> AND 
            X.CREATIONTIMESTAMP <= <endTimeStampUTC>) OR 
        (X.RUNENDTIMESTAMP >=  <startTimeStampUTC> AND 
            X.RUNENDTIMESTAMP <= <endTimeStampUTC>) OR
        (X.RUNENDTIMESTAMP IS NULL AND 
            X.CREATIONTIMESTAMP <= <endTimeStampUTC>)
    )

Conclusion

This article examined where the data for operating system and DataStage Engine metrics is read from. It also examined how and where the properties for different types of object are retrieved.

Where data has been read from the operations database, you have seen example SQL queries demonstrating how to query the database. Similarly, where data has been read from the operating system, you have seen the exact system calls, or equivalent, used to retrieve the data. Finally you have seen how the Operations Console user interface combines data from the operations database, operating system, and the metadata repository to produce a single view of the entire system.


Acknowledgements

We would like to thank the user experience team who created the visual design of the IBM InfoSphere DataStage and QualityStage Operations Console, which challenged us to produce an application matching it as closely as possible. Thanks is also due to the quality assurance team who ensured everything worked as intended on all of the possible combinations of environments. Finally, we would like to thank the wider development team, without whom there would be no Operations Console application.

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.
  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.

Discuss

  • Get involved in the My developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=834036
ArticleTitle=Interpret IBM InfoSphere DataStage and QualityStage Operations Console metrics
publish-date=09132012