An overview of monitoring and alerting features in IBM WebSphere Cast Iron Cloud Integration
This article discusses monitoring and alerting features provided by IBM WebSphere Cast Iron Cloud integration (hereafter called Cast Iron) and recommended best practices to take advantage of these features.
Overview of monitoring features
The Web Management Console (WMC) provides an overview of the published and deployed projects, along with a drill-down into the orchestrations within those projects. It also provides information about the health of a given appliance, displaying memory use and disk and CPU utilization. The Cast Iron Live environment does not show all the metrics shown on physical and virtual environments because it is a multi-tenant environment.
Overview of alerting features
The Cast Iron operating system provides a configurable alerting subsystem that an administrator sets up for a specific customer environment. The alerting module sends alerts from different Cast Iron subsystems to a configurable group of people within an organization by email or SNMP, or both.
Additionally, orchestrations can generate alerts through the logging activities, or through the generation of emails or Web service calls to alerting services within a customer environment. These alerting services may be other Cast Iron orchestrations or third-party APIs available within the network.
Cast Iron is a “lights-out” integration solution, requiring minimal monitoring by a customer. A customer may periodically log into the appliance - physical, virtual, or cloud – to ensure that resources are nominal or that jobs are executing as expected, but it is not mandatory. Instead, customers should use alerts to guide them to a problem that needs attention.
During development, it is typical to use a high level of logging to examine the flow and transformation of the data in an orchestration. When a project moves to production status, it is important to lower the logging levels to maintain the high performance of the appliance. Logging causes higher amounts of I/O within the appliance, and this causes a negative impact on the performance of each job.
Lowering the logging levels means you cannot see into the individual steps of an orchestration. This means that sufficient quality assurance (QA) must take place prior to moving an orchestration to the production environment, and that an orchestration contains error handling logic in accordance with Cast Iron best practices.
Table 1 describes the logging levels for an orchestration.
Table 1. Orchestration logging levels
||No job activity is logged to WMC for orchestrations configured as "None". It will appear that no jobs are executing for these orchestrations.|
|Initial Values||Only the initial values of orchestration variables are logged. This is useful only for debugging if you want to study the triggering values for a job, such as the HTTP headers or Web Service body, and you expect the orchestration to fail due to insufficient error handling. Depending on the size of the initial value data, and any other initialized variables, this can still have an impact on the orchestration performance.|
|Initial and Error Values||Both the initial values of orchestration variables and orchestration errors are logged. This is useful only for debugging if you want to study the triggering values for a job, such as the HTTP headers or Web Service body, and you expect the orchestration to fail due to insufficient error handling. Depending on the size of the initial value data, and any other initialized variables, this can still have an impact on the orchestration performance.|
|Error Values||Only orchestration jobs that fail will log their variables. This is the recommended logging level for production environments. However, it is important to note that orchestrations using try-catch blocks will never reach an error state.|
|Inline||Orchestration job details are logged for sub-orchestrations in line with the job details of the calling orchestration. This is useful for debugging parent-child orchestration designs if there are problems with the data being passed between orchestrations.|
|All||This is the highest level of logging and logs all variables and steps of an orchestration job. Use this logging level only for development and testing, or for urgent debugging of a production orchestration. This logging level can affect performance of orchestration jobs by up to 300% depending on the number of activities, the number of variables, and the volume of data per job.|
There are six job statuses. Table 2 describes each status.
Table 2. Job status
|Running||The job is currently executing.|
|Completed||The job ended without an error (this is true whether or not Catch blocks are executed during the running of the job.)|
|Errored||The job encountered a fatal error that is not handled by a catch block. If the logging level is set to anything other than "None", the job detail shows the job state at the time of the failure.|
|Terminated||The job ended because of a Terminate activity within the orchestration.|
|Suspended||An administrator has stopped the project while jobs are executing. The jobs resumes if the administrator restarts the project, or enters a Canceled state if the project is undeployed.|
|Canceled||An administrator explicitly canceled the running job.|
A job key is a means of exposing details about an orchestration job through WMC. Typically, the job key provides information that uniquely identifies data passing from endpoint to endpoint, such as a batch number or primary key. However, since this is configurable within the orchestration, the key can simply be a status message such as “Processed 100 rows”.
Figure 1 shows the job results panel visible in WMC using the Home > Dashboard menu item. It shows completed jobs and the job key value that the orchestration assigned.
Figure 1. Job keys displayed in WMC
The developer sets up job keys per orchestration within Studio, and uses the Create Job Keys activity to assign values. One of the keys can be marked as “Primary”. The primary key becomes searchable within WMC. The developer controls the naming of the job keys, and they become the input parameters to the Create Job Keys activity. Figure 2 shows the job key setup screen, which you can display by clicking the green arrow icon at the beginning of an orchestration in Studio.
Figure 2. Setting up job keys
By default, WMC only logs the job ID - the unique ID assigned to each job. When an orchestration logs a primary job key, WMC displays that value in place of the job ID. The job key does not replace the job ID value. The job detail view within WMC displays the job ID.
Figure 3 shows the map inputs to the Create Job keys Activity, where the values are assigned to the three defined keys.
Figure 3. Mapping values to the job keys
Each orchestration supports multiple keys, but only one is the searchable primary key. Values logged to non-primary keys are displayed in the job details screen in WMC as shown in Figure 4. You can display the screen by clicking Home > Dashboard, clicking on the job, and then clicking the Job Keys link.
Figure 4. Job keys displayed in the job details panel in WMC
Note: The orchestrations configured with logging levels set to “None” will display no information and no job keys within WMC.
Job keys and error handling
One conundrum with Cast Iron involves error handling and job statuses, which job keys can address. Cast Iron best practices dictate that orchestrations use try-catch blocks to control the orchestration flow during error conditions. Since this traps any errors that occur, jobs will end with a “Completed” job status. This makes it difficult to identify jobs in WMC that encountered issues.
By using a status value within the primary job key – in conjunction with alerting - it is possible to use the search functionality to find those jobs within WMC. For example, you can use the concatenate function in a map activity to create a string containing a status and the number of records processed, “Success – 100 of 100 records processed”, or “Error – 50 of 100 records processed”. By using the status value in the key, you can search WMC for “Error” and this shows all the jobs that did not process a complete batch successfully.
Job keys for monitoring
Using the same principle as using job keys in conjunction with an error handling strategy, job keys should be the primary component of job monitoring. For example, assume a scenario where you integrate an application such as Salesforce.com with your ERP system for synchronizing sales orders. If you use job keys to log the record IDs, you can use the search capability to answer questions such as “Did sales order 1234 transfer to the ERP system yet?” You can find the record by searching the key “1234”.
Searching job keys
Job keys are searchable left-to-right, and not by a wildcard. For example, you can find job key “1234” by searching “1”, “12”, “123”, or “1234”, but not “234” or “2345”. This is an important factor when defining the layout of job keys for a given scenario.
Figure 5 shows how much harder it is to search when the key value is at the end of the job key. The user must type the whole string to find the matching jobs. The search box is visible in WMC in the results panel when you click the Home > Dashboard menu item.
Figure 5. Searching job keys
For example, an orchestration uses job key “Processed 100 of 100 records – Success”. You cannot search for the word “Success” and get a result. The only way to search for “Success” is if the job key format is “Success – Processed 100 of 100 records”.
Figure 6 shows how much easier it is to find jobs when the key value is well-defined.
Figure 6. Searching a well-defined job key
You cannot use wildcards such as “%”, “?”, or “*” in the
searches. The searches automatically imply a wildcard after the characters
are entered into the search box:
There are two levels of alerting: system level and orchestration level. The appliance generates system level alerts based on the criteria set up in WMC. Orchestration level alerts are derived from activities in an orchestration. It is also possible to trigger system level alerts using orchestration level activities.
System level messages
The appliance maintains a system log that displays errors or warnings from the various subsystems of the operating system. Table 3 describes the list of subsystems.
Table 3. Subsystems that generate notifications
|Hardware||Contains errors related to the physical appliance such as fans, CPUs, disk, and so on.||Not applicable to Cast Iron Live|
|Resources||Reports issues with available disk space and memory.||Not applicable to Cast Iron Live|
|Network||Shows errors related to accessing endpoints or other network related issues.||Not applicable to Cast Iron Live|
|Security||Reports invalid logins for intrusion detection.|
|Orchestration||Displays errors from orchestration jobs, including messages from log message activities, errors where activities have failed due to mapping issues, SOAP faults, database errors, and so on.|
|Deployment||Shows errors caused when an orchestration cannot deploy properly because of invalid passwords in endpoints, and duplicate HTTP or Web Service URLs.|
System level notifications
The alerting subsystem links into the system log to allow an administrator to define policies (rules) to trigger notifications on system level events. On physical and virtual appliances, the administrator can choose to deliver notifications using email or SNMP or both. Cast Iron Live only allows notification delivery through email.
Figure 7 shows the notification panel in WMC accessible from Logs > Notifications, and shows a list of existing policies.
Figure 7. Notification screen in WMC showing existing policies
The administrator must provide the credentials for an SMTP relay server that delivers the notifications.
The administrator must provide the SNMP host and the trap community so that the appliance (virtual or physical) can deliver the traps to the monitoring tool.
The policies are simple rules based upon a severity and the subsystem. The severity levels are shown in Table 4.
Table 4. Types of severity for notifications
|ERROR||Serious errors that may need user attention|
|CRITICAL||Critical errors requiring user or administrator attention|
Examples of the notification rules are:
If the error is greater than “WARNING” on the
Orchestration subsystem, send the notification by email to
If the error is greater than “ERROR” on the Network
subsystem, send the notification by email to
email@example.com, and by SNMP to raise a trap.
Figure 8 shows the policy definition screen, where the user sets up a new notification policy. Access this screen by selecting Logs > Notifications > New Policy in WMC.
Figure 8. Setting up a policy
To reiterate, Cast Iron Live does not support SNMP notifications.
Email policy recommendations
Rather than send emails to a list of individuals, create a distribution list on the email server. Managing lists of individuals through WMC is inefficient and prone to problems if individuals leave a group or organization. It is better to send the emails to a group defined on the email server.
In the examples above, notifications from the orchestrations go to one group of users, and network errors go to another. This demonstrates a clean delineation in distribution of notifications.
Orchestration level notifications
By default, an orchestration job that fails will generate an ERROR level message in the system log. You can use notification policies to trap these errors to generate emails or SNMP traps. However, you do not have control over the content of the generated message.
Cast Iron best practices dictate that orchestrations use try-catch blocks and if-then conditions to trap errors. It is then the choice of the developer on how to inform the end users of the problem.
The most common notification mechanism is to use the email activity because it is a simple interface. However, it supports a variety of options in the delivery format. Cast Iron supports complex MIME messages, so it is possible to create attachments and HTML formatted emails with links to the job in WMC.
Web services activity
Some organizations have complex notification systems in house, perhaps through other middleware or other monitoring tools. If these systems have a public APIs, such as a Web services interface, Cast Iron can consume the WSDL and use the existing logging infrastructure through the Web services activity.
Log message activity
The log message activity allows an orchestration to write to the system log’s orchestration subsystem at a chosen severity level. In conjunction with a set of notification policies, the log message activity is a useful mechanism for generating simple notifications. The advantage is that the developer can control the message contained in the notification, unlike the standard system level notifications.
Figure 9 shows mapping to the log message activity, which is setting the severity level and the output message.
Figure 9. Mapping to the Log Message activity
The logged message displays in WMC in the System Log panel of the logs menu as an orchestration event. Figure 10 shows an example of the output of a log message activity in WMC, where the user defined the severity level as “Warning”.
Figure 10. Log message output in the WMC
You can use the Log Message activity to provide custom notifications or logging through WMC, tie this to the notification subsystem to automatically send emails, or raise SNMP traps.
This article showed how WebSphere Cast Iron provides a comprehensive solution for tracking, identifying, and searching jobs. It also showed how those features complement the monitoring and alerting capabilities of Cast Iron.
- Integrating cloud applications with WebSphere Cast Iron Cloud Integration
- WebSphere Cast Iron Cloud Integration product page
- WebSphere Cast Iron Information Center
- WebSphere Cast Iron Cloud Integration support
- Getting Started with IBM WebSphere Cast Iron Cloud Integration