Configuration and tuning guidelines for IBM InfoSphere DataStage Operations Console

IBM® InfoSphere® DataStage® Operations Console is a web-based monitoring tool. It gives InfoSphere DataStage and QualityStage customers a wide view into the operational environment of the IBM InfoSphere Information Server engines, monitoring current and past job activities, server resource usage, and the health status of server engine processes. The Operations Console can be enabled or disabled in your system. This article provides configuration and tuning guidance on how to minimize the performance effect of the Operations Console on the system. The guidance will enable you to leverage the many useful capabilities of the Operations Console with little or minimal overhead.

Share:

Ron Liu (ronliu@us.ibm.com), Software Engineer, IBM China

Ron LiuRon Liu is a technical lead of the IBM InfoSphere Information Server Performance Team. His work focuses on performance and scalability studies of IBM InfoSphere Information Server on grid and cluster environments, Big Data performance testing and tuning, capacity planning methodology, and industrial data integration benchmark development. Prior to his current job, he had seven years' experience in database server development (federation runtime, wrapper, query gateway, process model, and database security). He has a master's degree in computer science and a bachelor's degree in physics.



Sam Moussaoui (smoussao@us.ibm.com), Software Engineer, IBM China

Sam MoussaouiSam Moussaoui is a software engineer with the IBM InfoSphere Information Server Performance team. His main work focus is on improving the performance of the Information Server products by leading various performance studies doing performance testing, tuning, capacity planning, etc. Currently, he is on a one-year rotation assignment with the Information Server Concierge Team helping customers upgrade to the latest version of InfoSphere Information Server.



Chun Hua Sun (schunhua@cn.ibm.com), Software Engineer, IBM China

Chun Hua SunChun Hua Sun is a software engineer in IBM China Development Lab in Beijing. His work focuses on performance testing and tuning for the IBM InfoSphere Information Server products. Prior to his current job, Chun Hua has two years of experience in system verification test. Chun Hua has published several articles in IBM developerWorks in Chinese that cover topics in open source testing tools, test automation technologies, performance monitoring, and system optimization.



05 July 2012

Also available in Chinese Russian Portuguese

Operations Console overview

  • Value
  • Components in an InfoSphere Information Server environment
  • Performance characterization
  • Factors affecting performance impact
  • Tuning guidance to minimize performance impact
  • Monitoring the database health of the database
  • Capacity planning
  • Conclusion
  • Acknowledgements

Value

The Operations Console provides a detailed, historical view and a complete system health check of the operational environment of InfoSphere Information Server. The Operations Console provides:

  • A high-level view of job runtime activity over a configurable time period
  • The ability to compare runtime information between jobs
  • A configurable view of operating system resources
  • A project view filtering
  • A summary and detailed view of jobs and job runs
  • Visual alerts of job run failures
  • Configurable alert thresholds
  • The ability to analyze job run activity
  • A view of resource consumption across the engine
  • A job run analysis of performance and log comparison

The Operations Console uses a relational database that records all operational information from InfoSphere DataStage that enables users to monitor and understand the performance of the InfoSphere DataStage environment successfully. A demo of the Operations Console is available (see Resources).


Components in an InfoSphere Information Server environment

Before getting into tuning the Operations Console, let's look at how it works. Figure 1 shows the major components of the Operations Console highlighted in blue as they exist in the Information Server environment.

Figure 1. The Operations Console components in the Information Server environment
Images shows Operations Console components in the Information Server environment

As shown, the Operations Console architecturally includes the Operations Database, a web-based client, and components built into the engine tier and services tier of InfoSphere Information Server. The Operations Database stores the operational data and allows updating and querying against that data. The Operations Console components in the engine tier include:

  • Services status checker for monitoring the status and health of the monitoring processes
  • Job event monitor for collecting and aggregating job statistics and runtime event logs and updating the Operations Database
  • System resources monitor for monitoring the system resource (CPU, memory, disk, etc.) utilization and updating the Operations Database
  • Services for querying the Operations Database

The Operations Console components in the services tier include services for querying the Operations Database. The Operations Console component in the client tier includes the web-based client for the Operations Console.

Major Operations Console operations can be categorized into load and query operations. For load operations, when the Operations Console is enabled, the system collects and aggregates job execution details (parameters, status, statistics, logs, etc.) and system resource utilization (CPU, memory, disk, etc.) information and inserts them into the Operations Database periodically in short intervals. For query operations, when you use the web-based client to monitor the job executions or view job run history, for example, queries are submitted against the Operations Database, and the information is retrieved by using services in the services tier. To support these operations, the Operations Console requires additional system resources (CPU, memory, I/O) in the engine tier, services tier, and the Operations Database server. However, the system resources requirements are not equally applicable to all these tiers (see the Capacity Planning for the Operations Console section later in this paper for details).


Performance

Figure 2 shows the performance impact of the Operations Console on InfoSphere DataStage with default configuration settings. The figure shows the overhead ratio of throughput of jobs with and without the Operations Console.

Figure 2. Performance impact of the Operations Console compared to system CPU utilization
Image shows performance impact of the Operations Console compared to system CPU utilization

The performance test was conducted on an InfoSphere DataStage cluster environment that consisted of four nodes (computers), each of which had four CPUs. The test results were based on default Operations Console settings, running 10 web sessions. The scale-out design of the InfoSphere Information Server engine allows jobs to run across multiple computers. One of these computers is designated as the primary node, or "head node." This primary node is the node against which Information Server clients validate engine credentials. It is also where the Operations Console code in the engine tier executes.

As shown in Figure 2, the Operations Console has insignificant impact on InfoSphere DataStage performance when the CPU utilization of the primary node is under 90 percent. Even though the Operations Console utilizes some CPU resources, it is below 10 percent and, therefore, does not affect the performance of jobs when there is CPU resource headroom available on the primary node. However, when the workload pushes the CPU utilization of the primary node over 90 percent, the overhead of the Operations Console will result in some performance effect on the jobs. The throughput of the jobs decreases steadily and can become 10-percent worse when the CPU is fully utilized. Note that for different server models or InfoSphere Information Server engine configurations (running in a single computer, a cluster, or a grid), the threshold of CPU utilization before you see an effect by Operations Console may vary. However, this threshold is usually very high, and the overhead of Operations Console was negligible in several test scenarios.


Factors affecting performance

As shown in Figure 2, the default configuration of Operations Console results in negligible performance overhead, except when exceeding the CPU utilization threshold on the primary node. There are a number of factors that affect the degree of impact of Operations Console on the InfoSphere Information Server runtime performance:

  • The number of Operations Console web sessions and their refresh intervals
  • The frequency of collecting monitoring data
  • The amount of monitoring data to collect
  • The time interval of updating the Operations Database

The number of web sessions cannot be configured, but the other performance factors are controlled by configuration parameters. The complete list of parameters you can configure can be found in the <Installation_Directory>/Server/DSODB/DSODBConfig.cfg configuration file. Table 1 shows the list of configuration parameters.

Table 1. Parameters in DSODBConfig.cfg
ParametersDescriptionDefault
MaxWarningsThe maximum number of warning messages to be sent to the Operations Database for each job run.10
UpdateIntSecsThe interval in seconds between successive events that update the overall run statistics.10 sec.
TraceMaxThe maximum number of lines to be written to the trace file when trace is enabled.disabled
JobRunCheckIntervalThe interval in minutes for automatically validating currently running jobs.60 min.
JobRunUsageDefines whether job run resource usage data is collected.enabled
JobRunAggSnapsThe number of snapshot values that are included in a single row before a new row is started.15
ResourceMonitorDefines whether system resource data is collected.enabled
ResourcePollPeriodThe frequency in seconds for how often a resource snapshot is taken.10 sec.
ResourceSampleSizeThe number of snapshots that are taken before an aggregated record of those values is stored.6
ResourceAllAggregatedUsageDefines whether to always store the resource usage data (enabled) or to only store the resource usage data when there is job activity (disabled).enabled
ResourceAggRunPollPeriodThe number of aggregated snapshots that are automatically stored before and after any job activity has been detected when ResourceAllAggregatedUsage is disabled.10
ResourceAggNonRunPollPeriodThe frequency in minutes for how often to check whether there is job activity when ResourceAllAggregatedUsage is disabled.1 min.

The threshold of performance impact in Figure 2 can be shifted left or right if system configuration or some of these parameters are changed. For example, if the system configuration is changed by adding more CPUs, the percentage of CPU consumption by the Operations Console services will be relatively smaller, and the performance impact of the Operations Console will be less, and, therefore, the threshold in Figure 2 will be shifted right. You can even further minimize the resource consumption of the Operations Console by changing a few configuration parameters. Two important factors affecting Operations Console performance are update interval and number of Operations Console user web sessions.

Figure 3 describes the CPU utilization of Operations Console services on the primary node for update intervals (UpdateIntSecs parameter) of collecting information and loading the Operations Database. The image shows three update intervals: 2 seconds, 5 seconds, and 10 seconds.

Figure 3. Operations Console CPU consumption for different update intervals
Image shows CPU utilization of Operations Console services on primary node for update intervals of collecting information and loading the Operations Database

As shown, with the specific workload (running around 40,000 jobs daily, serving 10 Operations Console web sessions) running in the specific computers, when the update interval is 2 seconds, the Operations Console services consume around 16 percent of a CPU. But if the update interval is lengthened, these processes consume less CPU — around 12 percent of a CPU with a 5-second interval and only around 9 percent of a CPU with a 10-second interval.

Figure 4 shows the performance effect of varying the number of Operations Console web sessions. The number of sessions is varied from 10 down to 1, and an additional data point shows the performance when the Operations Console is turned off.

Figure 4. Operations Console effect on throughput by varying the number of Operations Console web sessions
Image shows Operations Console impact on throughput showing DataStage throughput changes with different number of Operations Console web sessions

As shown, with the specific workload (CPU utilization reaches 97 percent of total CPU capacity in the primary node), with fewer Operations Console web sessions, the performance effect of the Operations Console becomes less obvious, from around 7-percent impact to throughput with 10 Operations Console web sessions to only 1 percent with one Operations Console web session.

The Operations Console does not have any performance effect on InfoSphere Information Server if there is sufficient capacity left for the Operations Console to run in the InfoSphere Information Server engine. But the performance impact of the Operations Console can start to show when the server becomes strained, and the threshold is reached. Depending on how the Operations Console is configured, its impact to performance can vary. The effect is relatively less obvious if the Operations Console is configured to collect smaller amounts of monitoring data, insert or update the Operations Database less frequently, or support fewer Operations Console web sessions.


Tuning guidance to minimize performance impact

As discussed, the Operations Console can affect the performance of jobs when the InfoSphere Information Server engine becomes strained. When this performance impact occurs, you need to more finely tune the Operations Console. As a high-performance information integration platform, InfoSphere Information Server includes an efficient and scalable engine designed to aggressively utilize available system resources (CPU and memory, for example) when needed. It is not uncommon to see InfoSphere DataStage push CPU utilization to a very high level (more than 90 percent), when running some workloads. In those scenarios, the Operations Console might affect the performance of jobs. You can configure several tuning parameters to make the performance effect less noticeable.

Tuning can be done for query operations and load operations. For query operations, to make the effect of the Operations Console less noticeable, you change the refresh interval of the Operations Console web client to a higher value or shut down unnecessary Operations Console sessions that periodically query the Operations Database even if users are not actively interacting with the client. For load operations, although the default settings of the parameters in the DSODBConfig.cfg file are considered optimal, you can exploit the setting of one or more parameters to lower the amount of monitoring data to be collected or to lower the frequency of updating Operations Database. Table 2 lists these tunable parameters.

Table 2. Tunable parameters in the DSODBConfig.cfg file
ParametersActions and results
MaxWarningsDecreasing the number will result in less data being collected.
UpdateIntSecsIncreasing the number will result in lower updating frequency.
TraceMaxIf this has been enabled, decreasing the number will reduce the cost of writing to the trace file.
JobRunCheckIntervalIncreasing the number will reduce the percentage of time spent validating running jobs.
JobRunUsageDisabling this option will result in less data being collected.
JobRunAggSnapsIncreasing the number will result in less data being collected.
ResourceMonitorDisabling this option will result in less data being collected.
ResourcePollPeriodIncreasing the number will result in less data being collected.
ResourceSampleSizeIncreasing the number will result in less data being collected.
ResourceAllAggregatedUsageDisabling this option will result in less data being collected.
ResourceAggRunPollPeriodIf ResourceAllAggregatedUsage is disabled, decreasing the number will result in less data being collected.
ResourceAggNonRunPollPeriodIf ResourceAllAggregatedUsage is disabled, increasing the number will result in less data being collected.

Monitoring database health of the Operations Database

In addition to carefully setting the configuration parameters for the Operations Console, you should also monitor the Operations Database to ensure that there are sufficient system resources to support the database server and that the database is healthy.

You should run system monitoring tools, such as nmon, iostat, vmstat, mpstat, to make sure that there is no bottleneck in I/O and memory in the server where the Operations Database runs and that the CPU is not maxed out. Pay close attention to the dynamic query execution time and the use of table spaces in the Operations Database. The performance of the dynamic query execution in the Operations Database has direct and dominating impact on the response time of the web client for the Operations Console. Monitoring dynamic query execution in the Operations Database is the first step in diagnosing slow operations.

To get a snapshot of the dynamic query execution in the Operations Database, you can use the DB2® snapshot utility, for example, by issuing the command db2 get snapshot for dynamic sql on dsodbdb, where dsodbdb is the Operations Database. Or you can use the db2top monitoring tool to dynamically display the performance of the dynamic queries in the Operations Database, for example, by issuing the command db2top —d dsodbdb, where dsodbdb is the Operations Database, then selecting option D to display the dynamic queries. Figure 5 shows the output for dynamic queries in the Operations Database. (Discussions of query tuning are out of the scope of this article. However, you can follow the query tuning guides from the specific database product vendor to perform query tuning if needed.)

Figure 5. Dynamic queries in the Operations Database
Image shows db2top output showing dynamic queries

Another area of interest is the usage pattern of the table spaces of the Operations Database. It will give you a good idea of how much disk space will be needed with your InfoSphere DataStage workloads and the Operations Console settings. The usage pattern of the table spaces can be calculated with two consecutive snapshots that are one day, one week, or other intervals apart. You can calculate the use rate based on the difference of two consecutive snapshots and project how much disk storage will be needed. One easy way to get a snapshot of the table space usage is to connect to the Operations Database, and run the command db2 list tablespaces show detail. Figure 6 shows the output of that command.

Figure 6. Table space output for the DSODB Operations Database
Image shows table space output for DSODB Operations Database

Capacity planning

As discussed, enabling the Operations Console requires additional system resources. Allocating sufficient hardware resources for the Operations Console is an important measure to minimize the system performance effect that might occur. Table 3 shows the minimum hardware requirements for each tier.

Table 3. Hardware requirement for the Operations Console
InfoSphere Information Server engineServices tierDatabase server that hosts Operations Database
CPU (based on a CPU core of an IBM x3650 M3 7945/82Y X5690 or equivalent)0.250.251
Memory (GB)0.40.42
Disk (GB)0.2N/A5 + 0.5 for every 10,000 job executions
RemarksMemory is for default setting for the Operations Console (i.e., 384 MB of maximum Java™ heap size). The disk storage requirement is for DataStage to store some very short-lived temporary files that are removed later.Memory requirement is for the increase of maximum Java heap of WebSphere® Application Server to handle the additional load incurred by services for the Operations Console.The system requirements already include the installation of DB2 for Linux®, UNIX®, and Windows®.

The Operations Database can be placed on a dedicated computer. Or, if there are sufficient system resources in the metadata repository tier, the Operations Database can be placed in the same database instance as the metadata repository. However, it is not recommended to place the Operations Database in the engine tier when the engine tier is installed in its own computers for performance consideration.


Conclusion

This article demonstrates how the IBM InfoSphere DataStage Operations Console works and discusses the performance factors affecting the resource impact of the Operations Console. The performance effect of the Operations Console can be negligible if the InfoSphere Information Server engine has sufficient CPU headroom to run the Operations Console. But if the server becomes strained, the performance effect might become noticeable. This article provides guidelines to minimize such performance effects. Furthermore, it discusses how to monitor the operations database and how much disk storage is needed for storing the monitoring information with the workloads. Finally, the article provides information on capacity planning for the Operations Console. In conclusion, with sufficient system resources allocated and proper tuning, when needed, you can run Operations Console to harness its many useful capabilities with minimum or negligible overhead.

Acknowledgements

Many thanks to the contributors who have provided their valuable input, edits, and reviews of this document:

  • Sriram Padmanabhan, Distinguished Engineer and Chief Architect, InfoSphere Servers
  • Len Greenwood, Information Server architect of DataStage and related components
  • Tony Curcio, InfoSphere product management
  • Kiran Surapaneni, InfoSphere product management
  • Mi Wan Shum, manager, InfoSphere Information Server Performance

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.
  • Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edition and provides a solid base to build and deploy applications.
  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=823975
ArticleTitle=Configuration and tuning guidelines for IBM InfoSphere DataStage Operations Console
publish-date=07052012