- Operations Console overview
- Components in an InfoSphere Information Server environment
- Factors affecting performance
- Tuning guidance to minimize performance impact
- Monitoring database health of the Operations Database
- Capacity planning
- Downloadable resources
- Related topics
Configuration and tuning guidelines for IBM InfoSphere DataStage Operations Console
Operations Console overview
- Components in an InfoSphere Information Server environment
- Performance characterization
- Factors affecting performance impact
- Tuning guidance to minimize performance impact
- Monitoring the database health of the database
- Capacity planning
The Operations Console provides a detailed, historical view and a complete system health check of the operational environment of InfoSphere Information Server. The Operations Console provides:
- A high-level view of job runtime activity over a configurable time period
- The ability to compare runtime information between jobs
- A configurable view of operating system resources
- A project view filtering
- A summary and detailed view of jobs and job runs
- Visual alerts of job run failures
- Configurable alert thresholds
- The ability to analyze job run activity
- A view of resource consumption across the engine
- A job run analysis of performance and log comparison
The Operations Console uses a relational database that records all operational information from InfoSphere DataStage that enables users to monitor and understand the performance of the InfoSphere DataStage environment successfully. A demo of the Operations Console is available (see Related topics).
Components in an InfoSphere Information Server environment
Before getting into tuning the Operations Console, let's look at how it works. Figure 1 shows the major components of the Operations Console highlighted in blue as they exist in the Information Server environment.
Figure 1. The Operations Console components in the Information Server environment
As shown, the Operations Console architecturally includes the Operations Database, a web-based client, and components built into the engine tier and services tier of InfoSphere Information Server. The Operations Database stores the operational data and allows updating and querying against that data. The Operations Console components in the engine tier include:
- Services status checker for monitoring the status and health of the monitoring processes
- Job event monitor for collecting and aggregating job statistics and runtime event logs and updating the Operations Database
- System resources monitor for monitoring the system resource (CPU, memory, disk, etc.) utilization and updating the Operations Database
- Services for querying the Operations Database
The Operations Console components in the services tier include services for querying the Operations Database. The Operations Console component in the client tier includes the web-based client for the Operations Console.
Major Operations Console operations can be categorized into load and query operations. For load operations, when the Operations Console is enabled, the system collects and aggregates job execution details (parameters, status, statistics, logs, etc.) and system resource utilization (CPU, memory, disk, etc.) information and inserts them into the Operations Database periodically in short intervals. For query operations, when you use the web-based client to monitor the job executions or view job run history, for example, queries are submitted against the Operations Database, and the information is retrieved by using services in the services tier. To support these operations, the Operations Console requires additional system resources (CPU, memory, I/O) in the engine tier, services tier, and the Operations Database server. However, the system resources requirements are not equally applicable to all these tiers (see the Capacity Planning for the Operations Console section later in this paper for details).
Figure 2 shows the performance impact of the Operations Console on InfoSphere DataStage with default configuration settings. The figure shows the overhead ratio of throughput of jobs with and without the Operations Console.
Figure 2. Performance impact of the Operations Console compared to system CPU utilization
The performance test was conducted on an InfoSphere DataStage cluster environment that consisted of four nodes (computers), each of which had four CPUs. The test results were based on default Operations Console settings, running 10 web sessions. The scale-out design of the InfoSphere Information Server engine allows jobs to run across multiple computers. One of these computers is designated as the primary node, or "head node." This primary node is the node against which Information Server clients validate engine credentials. It is also where the Operations Console code in the engine tier executes.
As shown in Figure 2, the Operations Console has insignificant impact on InfoSphere DataStage performance when the CPU utilization of the primary node is under 90 percent. Even though the Operations Console utilizes some CPU resources, it is below 10 percent and, therefore, does not affect the performance of jobs when there is CPU resource headroom available on the primary node. However, when the workload pushes the CPU utilization of the primary node over 90 percent, the overhead of the Operations Console will result in some performance effect on the jobs. The throughput of the jobs decreases steadily and can become 10-percent worse when the CPU is fully utilized. Note that for different server models or InfoSphere Information Server engine configurations (running in a single computer, a cluster, or a grid), the threshold of CPU utilization before you see an effect by Operations Console may vary. However, this threshold is usually very high, and the overhead of Operations Console was negligible in several test scenarios.
Factors affecting performance
As shown in Figure 2, the default configuration of Operations Console results in negligible performance overhead, except when exceeding the CPU utilization threshold on the primary node. There are a number of factors that affect the degree of impact of Operations Console on the InfoSphere Information Server runtime performance:
- The number of Operations Console web sessions and their refresh intervals
- The frequency of collecting monitoring data
- The amount of monitoring data to collect
- The time interval of updating the Operations Database
The number of web sessions cannot be configured, but the other performance factors are controlled by configuration parameters. The complete list of parameters you can configure can be found in the <Installation_Directory>/Server/DSODB/DSODBConfig.cfg configuration file. Table 1 shows the list of configuration parameters.
Table 1. Parameters in DSODBConfig.cfg
|MaxWarnings||The maximum number of warning messages to be sent to the Operations Database for each job run.||10|
|UpdateIntSecs||The interval in seconds between successive events that update the overall run statistics.||10 sec.|
|TraceMax||The maximum number of lines to be written to the trace file when trace is enabled.||disabled|
|JobRunCheckInterval||The interval in minutes for automatically validating currently running jobs.||60 min.|
|JobRunUsage||Defines whether job run resource usage data is collected.||enabled|
|JobRunAggSnaps||The number of snapshot values that are included in a single row before a new row is started.||15|
|ResourceMonitor||Defines whether system resource data is collected.||enabled|
|ResourcePollPeriod||The frequency in seconds for how often a resource snapshot is taken.||10 sec.|
|ResourceSampleSize||The number of snapshots that are taken before an aggregated record of those values is stored.||6|
|ResourceAllAggregatedUsage||Defines whether to always store the resource usage data (enabled) or to only store the resource usage data when there is job activity (disabled).||enabled|
|ResourceAggRunPollPeriod||The number of aggregated snapshots that are automatically stored before and
after any job activity has been detected when ||10|
|ResourceAggNonRunPollPeriod||The frequency in minutes for how often to check whether there is job activity
when ||1 min.|
The threshold of performance impact in Figure 2 can be shifted left or right if system configuration or some of these parameters are changed. For example, if the system configuration is changed by adding more CPUs, the percentage of CPU consumption by the Operations Console services will be relatively smaller, and the performance impact of the Operations Console will be less, and, therefore, the threshold in Figure 2 will be shifted right. You can even further minimize the resource consumption of the Operations Console by changing a few configuration parameters. Two important factors affecting Operations Console performance are update interval and number of Operations Console user web sessions.
Figure 3 describes the CPU utilization of Operations Console services on the primary node for update intervals (UpdateIntSecs parameter) of collecting information and loading the Operations Database. The image shows three update intervals: 2 seconds, 5 seconds, and 10 seconds.
Figure 3. Operations Console CPU consumption for different update intervals
As shown, with the specific workload (running around 40,000 jobs daily, serving 10 Operations Console web sessions) running in the specific computers, when the update interval is 2 seconds, the Operations Console services consume around 16 percent of a CPU. But if the update interval is lengthened, these processes consume less CPU — around 12 percent of a CPU with a 5-second interval and only around 9 percent of a CPU with a 10-second interval.
Figure 4 shows the performance effect of varying the number of Operations Console web sessions. The number of sessions is varied from 10 down to 1, and an additional data point shows the performance when the Operations Console is turned off.
Figure 4. Operations Console effect on throughput by varying the number of Operations Console web sessions
As shown, with the specific workload (CPU utilization reaches 97 percent of total CPU capacity in the primary node), with fewer Operations Console web sessions, the performance effect of the Operations Console becomes less obvious, from around 7-percent impact to throughput with 10 Operations Console web sessions to only 1 percent with one Operations Console web session.
The Operations Console does not have any performance effect on InfoSphere Information Server if there is sufficient capacity left for the Operations Console to run in the InfoSphere Information Server engine. But the performance impact of the Operations Console can start to show when the server becomes strained, and the threshold is reached. Depending on how the Operations Console is configured, its impact to performance can vary. The effect is relatively less obvious if the Operations Console is configured to collect smaller amounts of monitoring data, insert or update the Operations Database less frequently, or support fewer Operations Console web sessions.
Tuning guidance to minimize performance impact
As discussed, the Operations Console can affect the performance of jobs when the InfoSphere Information Server engine becomes strained. When this performance impact occurs, you need to more finely tune the Operations Console. As a high-performance information integration platform, InfoSphere Information Server includes an efficient and scalable engine designed to aggressively utilize available system resources (CPU and memory, for example) when needed. It is not uncommon to see InfoSphere DataStage push CPU utilization to a very high level (more than 90 percent), when running some workloads. In those scenarios, the Operations Console might affect the performance of jobs. You can configure several tuning parameters to make the performance effect less noticeable.
Tuning can be done for query operations and load operations. For query operations, to make the effect of the Operations Console less noticeable, you change the refresh interval of the Operations Console web client to a higher value or shut down unnecessary Operations Console sessions that periodically query the Operations Database even if users are not actively interacting with the client. For load operations, although the default settings of the parameters in the DSODBConfig.cfg file are considered optimal, you can exploit the setting of one or more parameters to lower the amount of monitoring data to be collected or to lower the frequency of updating Operations Database. Table 2 lists these tunable parameters.
Table 2. Tunable parameters in the DSODBConfig.cfg file
|Parameters||Actions and results|
|MaxWarnings||Decreasing the number will result in less data being collected.|
|UpdateIntSecs||Increasing the number will result in lower updating frequency.|
|TraceMax||If this has been enabled, decreasing the number will reduce the cost of writing to the trace file.|
|JobRunCheckInterval||Increasing the number will reduce the percentage of time spent validating running jobs.|
|JobRunUsage||Disabling this option will result in less data being collected.|
|JobRunAggSnaps||Increasing the number will result in less data being collected.|
|ResourceMonitor||Disabling this option will result in less data being collected.|
|ResourcePollPeriod||Increasing the number will result in less data being collected.|
|ResourceSampleSize||Increasing the number will result in less data being collected.|
|ResourceAllAggregatedUsage||Disabling this option will result in less data being collected.|
|ResourceAggRunPollPeriod||If ResourceAllAggregatedUsage is disabled, decreasing the number will result in less data being collected.|
|ResourceAggNonRunPollPeriod||If ResourceAllAggregatedUsage is disabled, increasing the number will result in less data being collected.|
Monitoring database health of the Operations Database
In addition to carefully setting the configuration parameters for the Operations Console, you should also monitor the Operations Database to ensure that there are sufficient system resources to support the database server and that the database is healthy.
You should run system monitoring tools, such as nmon, iostat, vmstat, mpstat, to make sure that there is no bottleneck in I/O and memory in the server where the Operations Database runs and that the CPU is not maxed out. Pay close attention to the dynamic query execution time and the use of table spaces in the Operations Database. The performance of the dynamic query execution in the Operations Database has direct and dominating impact on the response time of the web client for the Operations Console. Monitoring dynamic query execution in the Operations Database is the first step in diagnosing slow operations.
To get a snapshot of the dynamic query execution in the Operations Database, you
can use the DB2® snapshot utility, for example, by issuing the command
db2 get snapshot for dynamic sql on dsodbdb, where
dsodbdb is the Operations Database. Or
you can use the db2top monitoring tool to dynamically display the performance of
the dynamic queries in the Operations Database, for example, by issuing the command
db2top —d dsodbdb, where
dsodbdb is the Operations Database, then selecting option
D to display the dynamic queries. Figure 5 shows the output for dynamic queries in
the Operations Database. (Discussions of query tuning are out of the scope of this
article. However, you can follow the query tuning guides from the specific database
product vendor to perform query tuning if needed.)
Figure 5. Dynamic queries in the Operations Database
Another area of interest is the usage pattern of the table spaces of the Operations
Database. It will give you a good idea of how much disk space will be needed with
your InfoSphere DataStage workloads and the Operations Console settings. The usage
pattern of the table spaces can be calculated with two consecutive snapshots that
are one day, one week, or other intervals apart. You can calculate the use rate based
on the difference of two consecutive snapshots and project how much disk storage will
be needed. One easy way to get a snapshot of the table space usage is to connect to
the Operations Database, and run the command
db2 list tablespaces
show detail. Figure 6 shows the output of that command.
Figure 6. Table space output for the DSODB Operations Database
As discussed, enabling the Operations Console requires additional system resources. Allocating sufficient hardware resources for the Operations Console is an important measure to minimize the system performance effect that might occur. Table 3 shows the minimum hardware requirements for each tier.
Table 3. Hardware requirement for the Operations Console
|InfoSphere Information Server engine||Services tier||Database server that hosts Operations Database|
|CPU (based on a CPU core of an IBM x3650 M3 7945/82Y X5690 or equivalent)||0.25||0.25||1|
|Disk (GB)||0.2||N/A||5 + 0.5 for every 10,000 job executions|
|Remarks||Memory is for default setting for the Operations Console (i.e., 384 MB of maximum Java™ heap size). The disk storage requirement is for DataStage to store some very short-lived temporary files that are removed later.||Memory requirement is for the increase of maximum Java heap of WebSphere® Application Server to handle the additional load incurred by services for the Operations Console.||The system requirements already include the installation of DB2 for Linux®, UNIX®, and Windows®.|
The Operations Database can be placed on a dedicated computer. Or, if there are sufficient system resources in the metadata repository tier, the Operations Database can be placed in the same database instance as the metadata repository. However, it is not recommended to place the Operations Database in the engine tier when the engine tier is installed in its own computers for performance consideration.
This article demonstrates how the IBM InfoSphere DataStage Operations Console works and discusses the performance factors affecting the resource impact of the Operations Console. The performance effect of the Operations Console can be negligible if the InfoSphere Information Server engine has sufficient CPU headroom to run the Operations Console. But if the server becomes strained, the performance effect might become noticeable. This article provides guidelines to minimize such performance effects. Furthermore, it discusses how to monitor the operations database and how much disk storage is needed for storing the monitoring information with the workloads. Finally, the article provides information on capacity planning for the Operations Console. In conclusion, with sufficient system resources allocated and proper tuning, when needed, you can run Operations Console to harness its many useful capabilities with minimum or negligible overhead.
Many thanks to the contributors who have provided their valuable input, edits, and reviews of this document:
- Sriram Padmanabhan, Distinguished Engineer and Chief Architect, InfoSphere Servers
- Len Greenwood, Information Server architect of DataStage and related components
- Tony Curcio, InfoSphere product management
- Kiran Surapaneni, InfoSphere product management
- Mi Wan Shum, manager, InfoSphere Information Server Performance
- Watch the "InfoSphere DataStage Operations Console" demo of the Operations Console.
- Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edition and provides a solid base to build and deploy applications.
- Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.