This is part of a group of documents and examples commonly referred to as the System Management Methodology (SMM). The SMM provides information and techniques as examples of how an administrator uses the standard features of IBM Cognos Administration along with IBM Cognos Business Intelligence functionality in order to increase their own productivity and pro-actively manage IBM Cognos BI applications, users, and servers. By combining the live view of IBM Cognos BI system activity provided by the IBM Cognos Administration features with system trending information using the reporting and analytical features of IBM Cognos BI, administrators can get a full view of BI system utilization.
Additional information about the Administrative features of IBM Cognos BI are available within the IBM Cognos Business Intelligence Administration and Security Guide.
The documented information and technique(s) apply to all IBM Cognos Business Intelligence 10.1.1, 10.2.1 and 10.2.2 installations. Although precautions are taken to ensure that the information and technique(s) span newer releases, some of the content may become obsolete and/or no longer applicable.
Content found in an appendix pertains to previously released System Management Methodology topics/techniques that have been replaced with a newer technique. The newer technique(s) will appear in the regular chapters of the document. When a new topic/technique is added that replaces a technique, it will be highlighted in the document.
Exclusions and Exceptions
The scope of the documents in this series may not include detailed steps on how to use the product as this information is contained within the core product documentation. For example, this document deals with thresholds that can be applied to the system metrics. The document offers guidance on how to interpret those metrics to apply default thresholds programmatically but does not include the actual steps required to manually set or modify a threshold.
System Trending and Metric Thresholds
System trending is the act of collecting and analysing system metric data over time so that patterns and trends can be detected and either reacted to or planned for. In the case of IBM Cognos BI, system trending can be accomplished using the JMX based system metrics generated by the IBM Cognos dispatcher along with IBM Cognos Business Intelligence reporting and analytical features. An example of when system trending can be useful for an administrator is when it is used for capacity planning. In its simplest form, system utilization can be tracked over time by accumulating IBM Cognos BI system metrics and using that data to create reports that provide a visual representation of application use patterns – allowing an administrator to understand when excess capacity is available on IBM Cognos BI servers and also predict when additional server capacity will be required. System trending information can also be used to resolve processing conflicts or bottlenecks and help to isolate the root cause of recurring problems.
System trending for IBM Cognos BI involves the accumulation of system metrics over time within a “system trending” reporting database, and the definition and management of system metric thresholds to simplify problem identification.
Collecting Daily Metric Values
As the metrics, which can be viewed in the IBM Cognos Administration console, are dynamically surfaced via Java Management eXtension (JMX) MBeans, IBM Cognos Business Intelligence provides a mechanism to export system metrics to flat files. The system metric data contained within JMX MBeans can also be consumed directly through Java management software or through IBM Tivoli Directory Integrator (TDI). Once the metrics have been exported, they can be used for various purposes such as,
- System trending
- Populate a reporting database with system metrics over time using IBM Tivoli Directory Integrator
- Setting default thresholds
- Set manually through the IBM Cognos Administration console
- Use the CogSystemMetricAnalyzer tool to process accumulated metric dump files and automatically set thresholds based upon recorded metric values
- Use IBM Tivoli Directory Integrator to archive system metrics
The following diagram outlines the process for managing IBM Cognos BI system metrics and is described below.
Figure 1. Illustration 1 - The process for managing IBM Cognos BI system metrics
- The first step in working with system metrics is to collect a history of metric values that are representative of normal application activity on your system.
- Metric values are accumulated for a length of time which makes sense to your application. For most applications, it makes sense to accumulate metric values for a 24 hour period – this means that metric values would reflect system utilisation for an entire day.
- Once the time window for accumulating system metric values has elapsed, the system metric values are exported to metricdump.xml files.
- This process of accumulating and exporting system metrics to dump files typically would continue for a length of time which would ensure a typical profile of application utilisation has been captured - one week of accumulated system metric value data is normally sufficient to profile application utilisation.
- Exporting the metrics from the system can be accomplished by modifying a control file (metricdumpconfiguration.xml) and setting an advanced system parameter for one dispatcher within your installation of IBM Cognos BI (DISP.MetricDumpEnabled).
- The second step is to use the metricdump.xml files and the accumulated metric values within the files to calculate and set system metric thresholds using the CogSystemMetricAnalyzer tool.
- CogSystemMetricAnalyzer reads all metricdump.xml files, calculates a threshold for each metric being monitored, and sets the threshold values by writing them to the IBM Cognos BI content store.
- Ongoing system metric values can be read from their live JMX MBeans and written to a relational reporting database for system trending activities. More information related to system trending and use of TDI can be found within the developerWorks document System Management Methodology For IBM Cognos 10 - TDI Integration.
Configuring system metric exports using metricdumpconfiguration.xml
Exporting system metrics is based on the values in the metricdumpconfiguration.xml control file located in the <install_dir>/configuration directory. The file is separated into three sections that control the export location, what content is exported and the frequency of the exports.
<filename> parameter value indicates the relative or absolute path to the folder where the export files will be written. The default value for the
<filename> parameter is ../logs/metricdump.xml. The value of ../logs/metricdump.xml indicates that this is a relative path from the location of the metricdumpconfiguration.xml file. Following the path listed, the default location for the export files will be the <install_dir>/logs directory. It is recommended that the filename value not be changed and remains as metricdump.xml. If the technique for setting default thresholds to metrics will be employed, the recommendation is to create a new directory for the export otherwise the default value can be used. In environments with multiple dispatchers, only the metricdumpconfiguration.xml file on the enabled dispatcher will have to be modified.
<mbeans> parameter controls what information gets exported to the metricdump.xml file. The types of information that can be exported are metrics, metric health, service health and service operational status. The metrics option contains actual numeric values that provide meaningful system utilisation data and is the only recommended option for system trending. By removing all of the options except the Metrics type, only the metrics will be exported. The
<mbeans> parameter should appear as follows,
<mbeans> <mbean>com.cognos:type=Metrics,*</mbean> </mbeans>
<interval> parameter controls the frequency of when the exports are executed. The value of the setting is in milliseconds. Depending on the purpose of the metric exports, this value should be adjusted accordingly. For example, if being exported to analyze a specific issue or report execution, the interval time may be more appropriate set at a lower interval. If the purpose is for system trending or for archiving purposes, then the interval value should be set higher. For example, to do a daily export, use a value of 86,400,000 milliseconds (24 hours). The default metric dump interval is set to 15000 milliseconds (15 seconds). The default setting should not be used except for testing purposes.
<resetAfterDump> parameter is used to perform an automatic reset of the metrics after an export has been performed. One of the important aspects of collecting these metrics is the ability to have them reset after an export has completed. In order to be able to compare and measure the metrics with any degree of accuracy and certainty, the numbers must relate to a similar period. For instance, comparing number from day one to cumulative numbers from day one and day two does not provide any significant value. Ideally, a snapshot of the numbers at the end of day one would be compared with a snapshot of just the numbers for day 2. To accomplish the automatic reset of the metrics after an export is performed, set the
<resetAfterDump> parameter to a value of true. Indicating a value of false would mean that every export would be an inclusive snapshot of all the previous snapshots. Although this may provide some benefit based on a particular use case, the document focuses on daily snapshots.
<limit> grouping of parameters deals with specific settings regarding the exported metrics file. To specify that the exports are to continue indefinitely a value of -1 would be used for the
<count> parameter. This is NOT a recommended approach as exporting the metrics without a specific purpose continually, especially with a small frequency interval, will cause extra load on the dispatcher.
<filesize> parameter specifies the maximum size (in bytes) of the export file before a new export file is created. It is recommended that this parameter be set to a small amount to assure that only one export is contained in a single file. This helps to keep the information organized as well as making it easier for techniques such as system trending to load the data. If the setting value is lower than required for a complete export operation, then the file size will be increased to contain the results from a single export. You can determine the size of a single metric dump by observing the actual file size for metricdump.xml before and after a metric dump has occurred. For IBM Cognos BI version 10.1.1, a
<filesize> value that is less than 160KB (163840 bytes) will cause the IBM Cognos BI system to export a single metric dump per file.
The number of backups to be kept is specified by the
<rollover> parameter and is similar to the way that the cogserver.log file operates. Every time a maximum file length is reached, a version number is appended to the end of the file name (metricdump.xml.1 for example). If the export files are not going to be loaded into a database or archived on a regular basis, then this number should not be set lower than the
<count> parameter to assure that all of the desired data is persisted for when it will be used. The structure of the
<limit> grouping of parameters is as follows:
<limit> <!-- maximum number of times to perform dumps. -1 is unlimited--> <count>24</count> <!-- maximum size of dump file before rollover --> <filesize>100000</filesize> <!-- number of backups kept --> <rollover>24</rollover> </limit>
Enable Metric Dumps
The second step in exporting the metrics to a flat file consists of enabling the export from the administration console. This involves setting the DISP.MetricDumpEnabled server parameter. To set this parameter,
- From within IBM Cognos Connection, click Launch > IBM Cognos Administration to open the IBM Cognos Administration console.
- Click on the Configuration tab and select the Dispatchers and services task.
- Click the Set Properties – Configuration icon.
- Navigate to the Settings page.
- Click on the Edit... link that corresponds to the Environment Advanced settings object.
The Edit... link will display the Set advanced settings – Configuration page. To enable the exports for a dispatcher, an advanced parameter must be set with the value being the dispatcher URI (for example, http://servername:9300/p2pd). Type the parameter name DISP.MetricDumpEnabled within the Parameter column of a blank row and the dispatcher URI within the Value column of the same row.
Figure 2. Illustration 2 - Setting the advanced parameter to enable metric dumps
To disable the metric exports for a particular dispatcher before the specified amount of iterations as per the value in the metricdumpconfiguration.xml file, the corresponding advanced parameter called DISP.MetricDumpEnabled must be removed.
- Navigate to the Set advanced settings - Configuration page as described above.
- Select the DISP.MetricDumpEnabled row by checking the corresponding checkbox and then click Delete at the bottom of the dialog.
- Click OK to save the changes and continue.
Since the exporting of metrics is a multi step process, this permits the control file on the file system to remain intact and the trigger to start or stop the metric exports can be controlled through the user interface.
Putting the pieces together
One of the goals of creating a series of metric snapshots is so that techniques such as the setting of default thresholds can be leveraged. The previous sections dealt with enabling the snapshots as well as how to reset the metrics, whereas this section highlights some operations, in a form of a checklist, that would have to be performed prior to implementing the above mentioned technique.
- Decide on a frequency for the snapshots. Daily metric exports are recommended but some environments may require a more frequent schedule.
- Decide on the amount of snapshot iterations to be tracked. If the environment is primarily used during the business week, the snapshots for the weekend days should be manually deleted so that they do not adversely affect the metric values being reported on.
- Configure the parameters in the metricdumpconfiguration.xml file to match the decisions made in the previous two steps.
- Enable the metric exports to begin.
- Note the time that the first export file was created.
- Once the specified number of snapshots has been taken, copy the metricdump.xml and metricdump.xml.x (where x indicates a version number) files to a new working directory for use with the CogSystemMetricAnalyzer tool.
While the ability to have real time metrics displayed in the IBM Cognos Administration console provides valuable information when monitoring the environment, the value all but disappears when not actively in the console watching the metrics. With this in mind, a feature was added that allows for the setting of thresholds on the individual metrics which can provide the basis for alerts through Event Studio. These thresholds allow administrators to set ranges that will provide them with a quick overall view into the system health. The ranges are displayed as green, yellow and red traffic light indicators. When a series of thresholds are assigned to key indicators that pertain to the specific environment, an overall scorecard is possible.
Figure 3. Illustration 3 - Metric thresholds are used to determine the status of system metrics
The screen capture above demonstrates that based on the threshold ranges set on individual metrics, a scorecard is possible. A quick glance at the scorecard indicates that there are a few services that have a red square indicator which means that the values for some of those service metrics have values that are above or below the norm and could warrant some further investigation.
Drilling down on one of the services with a red indicator, the LogService for example, reveals a more detailed view of the individual metrics and their score in the upper right hand Metrics frame. In the example below, the number of processed requests and the number of successful requests metrics have the red square indicator. This would indicate that either this service was being used more or less than anticipated.
Figure 4. Illustration 4 - A detailed view showing the status of individual system metrics for a service
The anticipated levels would be visible as part of the threshold definition. The threshold definition could be viewed and edited by clicking on the pencil icon to open the Set thresholds for metric window.
Figure 5. Illustration 5 – The Set thresholds for metric window that is used to manually set individual metric thresholds
The thresholds are made up of two distinct sections. The first section is the performance pattern which specifies whether high, middle, or low values are good and will therefore display the green circle indicator. The second section are the actual ranges that drive the type of indicator. The example above shows that low values are good, values of 34 and higher are out of the acceptable range for this metric and will display the red square indicator; a value of 33 would result in an amber diamond indicator and values less than 33 would indicate a green circle. The arrow indicators beside each threshold range are used to define the metric status relative to the threshold value. If a metric value of less than 33 represents a good status for the metric, then the default down arrow indicator would be used (for a metric where low values are good). If the acceptable metric values representing a good status for this metric was to be less than or equal to 33 then the up arrow indicator should be used. Clicking the down arrow button would change the display to show an up arrow for the green circle values:
Figure 6. Illustration 6 - Metric values less than 33 will display a green indicator
Figure 7. Illustration 7 - Metric values less than or equal to 33 will display a green indicator
When a threshold changes state and goes from green to yellow or yellow to red, this is known as a threshold exception and the exceptions will be written to the audit database (if one is configured for the environment) to the COGIPF_THRESHOLD_VIOLATIONS table.
It is important to note that the full range of auditing doesn’t have to be enabled for the exceptions to be written to the audit database. In other words, as long as there is an audit database that is part of the configuration, regardless of whether auditing of the individual components is enabled, the threshold exceptions will be recorded. Details such as the time of the threshold violation, the indicated status of the metric value (good, poor, average), the service associated with the threshold, and the threshold value ranges at the time of the exception are all recorded to the COGIPF_THRESHOLD_VIOLATIONS table.
For reporting purposes, the sample audit package available with IBM Cognos Business Intelligence incorporates the threshold table so that reports can easily be created to track threshold exception history. More information related to IBM Cognos BI auditing features can be found in the IBM Cognos BI Administration and Security Guide.
Setting Default Thresholds
Building on the system metrics, IBM Cognos Business Intelligence provides the ability to define threshold ranges on various metric types. A question that is frequently asked in regards to the metric thresholds is “Why there are no default thresholds supplied out of the box”? The short answer to that question is that it is impossible to provide generic thresholds that will satisfy all environments.
Metric values are influenced by factors such as usage patterns, duration of report executions, deployment architectures, number of users executing reports versus viewing saved output, etc. With all of these external factors contributing to the metric values, predicting valid threshold ranges becomes a near impossible task without intensive monitoring. Fortunately, by leveraging the CogSystemMetricAnalyzer tool, it becomes possible to assign default threshold ranges that are relevant for the target environment.
As part of the System Management Methodology a tool called CogSystemMetricAnalyzer is provided. The tool gathers a set of metric dump files, consolidates all of the values, and based on threshold ranges found in a control file, sets thresholds on desired metrics. The control file can be manually edited to customize the threshold ranges either globally or on a metric by metric basis. Keep in mind that the thresholds that get set through the tool only apply to the metrics at the service level. Consolidated metric thresholds at the server and system level need to be set manually. The reason for this is that a different, or more customized, algorithm may be desired for the consolidated metrics.
For multi-dispatcher deployments, the CogSystemMetricAnalyzer tool has to be installed into the install location for each instance of IBM Cognos BI and then run from each location. This will assemble a baseline for the entire system, recording each entry in the System Trending database with the dispatcher URI that the MBean is associated with.
To install and run the CogSystemMetricAnalyzer application,
- If using IBM Cognos BI version 10.1.1, download the CogSystemMetricAnalyzer(10.1.1).zip file that is attached to this article to the same location as the existing CogSystemMetricAnalyzer.zip file from the Installation process and unzip this ZIP file instead.
- Extract the files from the CogSystemMetricAnalyzer.zip file to the IBM Cognos BI root installation directory. This will create a directory called CogSystemMetricAnalyzer.
- Using a text editor, open the CogSystemMetricAnalyzer.properties file for editing.
- Modify the parameters to reflect the desired credentials and appropriate file paths. The credentials required to successfully set the thresholds must be a member of either the System Administrator and/or Server Administrator roles.
NOTE: The credentials in the CogSystemMetricAnalyzer.properties file are not encrypted. If credentials are stored in this file, the file should be secured using file system security. The values for the credentials and the path to the CogSystemMetricAnalyzer.properties file can also be specified using command line switches, which will override the values specified in the properties file.
- Save the file.
- Open a command prompt and change to the directory containing the CogSystemMetricAnalyzer tool.
- Execute the run.bat or run.sh file that maps to the version of IBM Cognos Business Intelligence that the CogSystemMetricAnalyzer tool will be accessing using the following command line switches,
- -l : Load and consolidate metric dump files, apply processing template, create the file specified by the property metricDumptoApply.xml
- -a : Apply thresholds using information contained in the file specified by the property metricDumptoApply.xml
- -d : Delete thresholds using information contained in the file specified by the property metricDumptoApply.xml
- -props<propfile> : Full path to the properties file for the application to use
- -e<url> : The URL to a Cognos dispatcher that the application will use to logon and write the threshold values to the IBM Cognos Business Intelligence Content Store. This switch will override the property cog.connect.endPoint.
- -n<nsID> : The namespace ID of the security namespace that contains the username the application will use to logon and then write the threshold values to the IBM Cognos Business Intelligence Content Store. This switch will override the property cog.connect.nameSpace.
- -u<name> : The username credential that the application will use to logon and then write the threshold values to the IBM Cognos Business Intelligence Content Store. This switch will override the property cog.connect.userName.
- -p<password> : The password for the username that the application will use to logon and then write the threshold values to the IBM Cognos Business Intelligence Content Store. This switch will override the property cog.connect.userPswd.
- Restart the IBM Cognos Business Intelligence application to display the new thresholds in the IBM Cognos Administration console.
Executing the run.bat file with the –l and –a switches, will consolidate all of the metricdump.xml files, then using the default threshold ranges specified in the metricDumpTemplate.xml file, a file named metricValuesToApply.xml will be created which contains all of the values that were used to populate the threshold ranges.
The metricDumpTemplate.xml file is the control file that determines what threshold ranges will be applied. The default ranges for all metric thresholds are 5% and 10%. The defaults can be overridden for each service by modifying the attributes that pertain to the service’s MBean. Shown below is the MBean for the Content Manager service.
<mbean name="com.cognos:type=Metrics,service=contentManagerService"> <!-- If true, include this mbean in the processing --> <inclusion>true</inclusion> <!-- For <format> "percent", the absolute numbers to add to the calculated base value to derive the threshold values --> <toAveragePercent>5</toAveragePercent> <toPoorPercent>10</toPoorPercent> <!-- For <format> "number", the multipliers to apply to the calculated base value to derive the threshold values --> <toAverageNumber>1.05</toAverageNumber> <toPoorNumber>1.10</toPoorNumber> <attributes> ... </attributes>
These attributes will be applied to all metrics that belong to the Content Manager service. If a particular service was to be excluded from the setting of default thresholds, the
<inclusion> parameter could be changed to false.
<attributes> element of each service’s
<mbean> member, a series of entries exists which define each metric within the MBean. Certain metrics have been designated to be included by default, but the list of individual metrics can be customized by changing the value of the
<inclusion> parameter. Shown below is the entry for the Successful Requests per Minute metric that has been configured to be excluded from having a default threshold applied.
<attribute name="SuccessfulRequestsPerMinute"> <format> </format> <inclusion> false </inclusion> <pattern> </pattern> <toAverage/> <toPoor/> </attribute>
If a threshold for the Successful Requests per Minute metric was desired, the attribute parameters could be changed. The changes below indicate that the format for the metric is a number, the metric is set to be included, and the performance pattern for the threshold is that the high values are good.
<attribute name="SuccessfulRequestsPerMinute"> <format> number </format> <inclusion> true </inclusion> <pattern> highIsGood </pattern> <toAverage/> <toPoor/> </attribute>
Generating the metricValuesToApply.xml file with the run.bat –l command would show that the range that will be applied for the Successful Requests per Minute metric is set to the default 1.05 and 1.10, 5% less for the yellow indication and 10% less for the red. The reason for the reduction is because the performance pattern was set to high values are good. If the pattern was set to low values are good, then the average and poor indications would have been an increase of 5% and 10% respectively.
<attribute name="SuccessfulRequestsPerMinute"> <format> number </format> <pattern> highIsGood </pattern> <rollup> average </rollup> <base> 542 </base> <average> 515 </average> <poor> 488 </poor> </attribute>
The range values can be overridden by adjusting the <toAverage> and <toPoor> parameters. If for instance the desired average and poor indication values were 10% and 20%, the metricDumpTemplate.xml file would be modified to,
<attribute name="SuccessfulRequestsPerMinute"> <format> number </format> <inclusion> true </inclusion> <pattern> highIsGood </pattern> <toaverage> 1.10 </toaverage> <topoor> 1.20 </topoor> </attribute>
Examining the metricValuesToApply.xml file after it was recreated would reveal that the threshold ranges for the Successful Requests per Minute metric have changed.
<attribute name="SuccessfulRequestsPerMinute"> <format> number </format> <pattern> highIsGood </pattern> <rollup> average </rollup> <base> 542 </base> <average> 488 </average> <poor> 434 </poor> </attribute>
Every service and individual metric can be optionally included or excluded by the application, as well as having the same granularity to modify the performance pattern, format, as well as the average and poor indication values.
The default values and included metrics are meant to provide a good baseline as to which metrics should be monitored. Based on varying requirements and environmental inputs, the list should be modified to reflect the particular needs of the IBM Cognos Business Intelligence application.
It is important to keep the metricValuesToApply.xml file. The reason that this file would be required after the thresholds were set is to delete them programmatically at a later time. When the run.bat file is executed with the –d switch to delete the thresholds, the threshold ranges that are found in the metricValuesToApply.xml file will be compared to the current thresholds. If they match, the threshold will be deleted. If the original threshold was modified so that the values are different, the threshold will not be deleted. Also, if a threshold was manually added, it will not be deleted.
The previous version of the System Management Methodology included a technique that leveraged Microsoft SQL Server Integration Services (SSIS), to load system metric data from a flat file to a relational database. While the technique still applies, the process can be time consuming due to the management of multiple flat files.
Leveraging another IBM product offering, Tivoli Directory Integrator, a new technique has been documented that automates the process by reading the metrics directly from the Java environment and writing them directly to the relational data source, eliminating the need to create and manage the flat files. This technique can be found within the document System Management Methodolgy For IBM Cognos 10 – Loading Metrics With IBM Tivoli Directory Integrator.