An overview of monitoring and alerting features in IBM WebSphere Cast Iron Cloud Integration

This article describes the monitoring and alerting capabilities of IBM® WebSphere® Cast Iron® Cloud Integration through the Studio and the Web Management Console.

Share:

Alan Moore (alan.moore@us.ibm.com), Senior Technical Staff Member, IBM

Photo of Alan MooreAlan Moore is a Senior Technical Staff Member at the North San Jose Lab in California. He is a nine-year veteran of Cast Iron, and has contributed to the business in areas ranging from sales support to solutions engineering. He led the implementations for early adopters of Cast Iron technology, established best practice guidelines for all Cast Iron users, and led the implementation of Cast Iron's user community Web site. Within the Cast Iron engineering team, Alan developed the first connectors for salesforce.com®, NetSuite®, and RightNow®, and is Cast Iron's leading authority on SaaS endpoints. Leading the solutions engineering team at Cast Iron, Alan drives the architecture and development of complex integration solutions for strategic customers and partners. He has developed over 100 pre-packaged integration templates called TIPs that users can download from the Cast Iron cloud environment.



04 April 2012

Also available in Chinese

Introduction

This article discusses monitoring and alerting features provided by IBM WebSphere Cast Iron Cloud integration (hereafter called Cast Iron) and recommended best practices to take advantage of these features.

Overview of monitoring features

The Web Management Console (WMC) provides an overview of the published and deployed projects, along with a drill-down into the orchestrations within those projects. It also provides information about the health of a given appliance, displaying memory use and disk and CPU utilization. The Cast Iron Live environment does not show all the metrics shown on physical and virtual environments because it is a multi-tenant environment.

Overview of alerting features

The Cast Iron operating system provides a configurable alerting subsystem that an administrator sets up for a specific customer environment. The alerting module sends alerts from different Cast Iron subsystems to a configurable group of people within an organization by email or SNMP, or both.

Additionally, orchestrations can generate alerts through the logging activities, or through the generation of emails or Web service calls to alerting services within a customer environment. These alerting services may be other Cast Iron orchestrations or third-party APIs available within the network.


Monitoring features

Cast Iron is a “lights-out” integration solution, requiring minimal monitoring by a customer. A customer may periodically log into the appliance - physical, virtual, or cloud – to ensure that resources are nominal or that jobs are executing as expected, but it is not mandatory. Instead, customers should use alerts to guide them to a problem that needs attention.

Logging levels

During development, it is typical to use a high level of logging to examine the flow and transformation of the data in an orchestration. When a project moves to production status, it is important to lower the logging levels to maintain the high performance of the appliance. Logging causes higher amounts of I/O within the appliance, and this causes a negative impact on the performance of each job.

Lowering the logging levels means you cannot see into the individual steps of an orchestration. This means that sufficient quality assurance (QA) must take place prior to moving an orchestration to the production environment, and that an orchestration contains error handling logic in accordance with Cast Iron best practices.

Table 1 describes the logging levels for an orchestration.

Table 1. Orchestration logging levels
Logging level Description
None
No job activity is logged to WMC for orchestrations configured as "None". It will appear that no jobs are executing for these orchestrations.
Initial Values Only the initial values of orchestration variables are logged. This is useful only for debugging if you want to study the triggering values for a job, such as the HTTP headers or Web Service body, and you expect the orchestration to fail due to insufficient error handling. Depending on the size of the initial value data, and any other initialized variables, this can still have an impact on the orchestration performance.
Initial and Error Values Both the initial values of orchestration variables and orchestration errors are logged. This is useful only for debugging if you want to study the triggering values for a job, such as the HTTP headers or Web Service body, and you expect the orchestration to fail due to insufficient error handling. Depending on the size of the initial value data, and any other initialized variables, this can still have an impact on the orchestration performance.
Error Values Only orchestration jobs that fail will log their variables. This is the recommended logging level for production environments. However, it is important to note that orchestrations using try-catch blocks will never reach an error state.
Inline Orchestration job details are logged for sub-orchestrations in line with the job details of the calling orchestration. This is useful for debugging parent-child orchestration designs if there are problems with the data being passed between orchestrations.
All This is the highest level of logging and logs all variables and steps of an orchestration job. Use this logging level only for development and testing, or for urgent debugging of a production orchestration. This logging level can affect performance of orchestration jobs by up to 300% depending on the number of activities, the number of variables, and the volume of data per job.

Job statuses

There are six job statuses. Table 2 describes each status.

Table 2. Job status
Status Description
Running The job is currently executing.
Completed The job ended without an error (this is true whether or not Catch blocks are executed during the running of the job.)
Errored The job encountered a fatal error that is not handled by a catch block. If the logging level is set to anything other than "None", the job detail shows the job state at the time of the failure.
Terminated The job ended because of a Terminate activity within the orchestration.
Suspended An administrator has stopped the project while jobs are executing. The jobs resumes if the administrator restarts the project, or enters a Canceled state if the project is undeployed.
Canceled An administrator explicitly canceled the running job.

Job keys

A job key is a means of exposing details about an orchestration job through WMC. Typically, the job key provides information that uniquely identifies data passing from endpoint to endpoint, such as a batch number or primary key. However, since this is configurable within the orchestration, the key can simply be a status message such as “Processed 100 rows”.

Figure 1 shows the job results panel visible in WMC using the Home > Dashboard menu item. It shows completed jobs and the job key value that the orchestration assigned.

Figure 1. Job keys displayed in WMC
Job keys displayed in WMC

The developer sets up job keys per orchestration within Studio, and uses the Create Job Keys activity to assign values. One of the keys can be marked as “Primary”. The primary key becomes searchable within WMC. The developer controls the naming of the job keys, and they become the input parameters to the Create Job Keys activity. Figure 2 shows the job key setup screen, which you can display by clicking the green arrow icon at the beginning of an orchestration in Studio.

Figure 2. Setting up job keys
Setting up job keys

By default, WMC only logs the job ID - the unique ID assigned to each job. When an orchestration logs a primary job key, WMC displays that value in place of the job ID. The job key does not replace the job ID value. The job detail view within WMC displays the job ID.

Figure 3 shows the map inputs to the Create Job keys Activity, where the values are assigned to the three defined keys.

Figure 3. Mapping values to the job keys
Mapping values to the job keys

Each orchestration supports multiple keys, but only one is the searchable primary key. Values logged to non-primary keys are displayed in the job details screen in WMC as shown in Figure 4. You can display the screen by clicking Home > Dashboard, clicking on the job, and then clicking the Job Keys link.

Figure 4. Job keys displayed in the job details panel in WMC
Job keys displayed in the job details panel in WMC

Note: The orchestrations configured with logging levels set to “None” will display no information and no job keys within WMC.

Job keys and error handling

One conundrum with Cast Iron involves error handling and job statuses, which job keys can address. Cast Iron best practices dictate that orchestrations use try-catch blocks to control the orchestration flow during error conditions. Since this traps any errors that occur, jobs will end with a “Completed” job status. This makes it difficult to identify jobs in WMC that encountered issues.

By using a status value within the primary job key – in conjunction with alerting - it is possible to use the search functionality to find those jobs within WMC. For example, you can use the concatenate function in a map activity to create a string containing a status and the number of records processed, “Success – 100 of 100 records processed”, or “Error – 50 of 100 records processed”. By using the status value in the key, you can search WMC for “Error” and this shows all the jobs that did not process a complete batch successfully.

Job keys for monitoring

Using the same principle as using job keys in conjunction with an error handling strategy, job keys should be the primary component of job monitoring. For example, assume a scenario where you integrate an application such as Salesforce.com with your ERP system for synchronizing sales orders. If you use job keys to log the record IDs, you can use the search capability to answer questions such as “Did sales order 1234 transfer to the ERP system yet?” You can find the record by searching the key “1234”.

Searching job keys

Job keys are searchable left-to-right, and not by a wildcard. For example, you can find job key “1234” by searching “1”, “12”, “123”, or “1234”, but not “234” or “2345”. This is an important factor when defining the layout of job keys for a given scenario.

Figure 5 shows how much harder it is to search when the key value is at the end of the job key. The user must type the whole string to find the matching jobs. The search box is visible in WMC in the results panel when you click the Home > Dashboard menu item.

Figure 5. Searching job keys
Searching job keys

For example, an orchestration uses job key “Processed 100 of 100 records – Success”. You cannot search for the word “Success” and get a result. The only way to search for “Success” is if the job key format is “Success – Processed 100 of 100 records”.

Figure 6 shows how much easier it is to find jobs when the key value is well-defined.

Figure 6. Searching a well-defined job key
Searching a well-defined job key

You cannot use wildcards such as “%”, “?”, or “*” in the searches. The searches automatically imply a wildcard after the characters are entered into the search box: search value*.


Alerting features

There are two levels of alerting: system level and orchestration level. The appliance generates system level alerts based on the criteria set up in WMC. Orchestration level alerts are derived from activities in an orchestration. It is also possible to trigger system level alerts using orchestration level activities.

System level messages

The appliance maintains a system log that displays errors or warnings from the various subsystems of the operating system. Table 3 describes the list of subsystems.

Table 3. Subsystems that generate notifications
Subsystem Description Notes
Hardware Contains errors related to the physical appliance such as fans, CPUs, disk, and so on. Not applicable to Cast Iron Live
Resources Reports issues with available disk space and memory. Not applicable to Cast Iron Live
Network Shows errors related to accessing endpoints or other network related issues. Not applicable to Cast Iron Live
Security Reports invalid logins for intrusion detection.
Orchestration Displays errors from orchestration jobs, including messages from log message activities, errors where activities have failed due to mapping issues, SOAP faults, database errors, and so on.
Deployment Shows errors caused when an orchestration cannot deploy properly because of invalid passwords in endpoints, and duplicate HTTP or Web Service URLs.

System level notifications

The alerting subsystem links into the system log to allow an administrator to define policies (rules) to trigger notifications on system level events. On physical and virtual appliances, the administrator can choose to deliver notifications using email or SNMP or both. Cast Iron Live only allows notification delivery through email.

Figure 7 shows the notification panel in WMC accessible from Logs > Notifications, and shows a list of existing policies.

Figure 7. Notification screen in WMC showing existing policies
Notification screen in WMC showing existing policies

Email notifications

The administrator must provide the credentials for an SMTP relay server that delivers the notifications.

SNMP notifications

The administrator must provide the SNMP host and the trap community so that the appliance (virtual or physical) can deliver the traps to the monitoring tool.

Policy setup

The policies are simple rules based upon a severity and the subsystem. The severity levels are shown in Table 4.

Table 4. Types of severity for notifications
Severity Description
INFO Informational messages
WARNING Warning messages
ERROR Serious errors that may need user attention
CRITICAL Critical errors requiring user or administrator attention

Examples of the notification rules are:

  • If the error is greater than “WARNING” on the Orchestration subsystem, send the notification by email to businessusers@mycompany.com.
  • If the error is greater than “ERROR” on the Network subsystem, send the notification by email to sysadmins@mycompany.com, and by SNMP to raise a trap.

Figure 8 shows the policy definition screen, where the user sets up a new notification policy. Access this screen by selecting Logs > Notifications > New Policy in WMC.

Figure 8. Setting up a policy
Setting up a policy

To reiterate, Cast Iron Live does not support SNMP notifications.

Email policy recommendations

Rather than send emails to a list of individuals, create a distribution list on the email server. Managing lists of individuals through WMC is inefficient and prone to problems if individuals leave a group or organization. It is better to send the emails to a group defined on the email server.

In the examples above, notifications from the orchestrations go to one group of users, and network errors go to another. This demonstrates a clean delineation in distribution of notifications.

Orchestration level notifications

By default, an orchestration job that fails will generate an ERROR level message in the system log. You can use notification policies to trap these errors to generate emails or SNMP traps. However, you do not have control over the content of the generated message.

Cast Iron best practices dictate that orchestrations use try-catch blocks and if-then conditions to trap errors. It is then the choice of the developer on how to inform the end users of the problem.

Email activity

The most common notification mechanism is to use the email activity because it is a simple interface. However, it supports a variety of options in the delivery format. Cast Iron supports complex MIME messages, so it is possible to create attachments and HTML formatted emails with links to the job in WMC.

Web services activity

Some organizations have complex notification systems in house, perhaps through other middleware or other monitoring tools. If these systems have a public APIs, such as a Web services interface, Cast Iron can consume the WSDL and use the existing logging infrastructure through the Web services activity.

Log message activity

The log message activity allows an orchestration to write to the system log’s orchestration subsystem at a chosen severity level. In conjunction with a set of notification policies, the log message activity is a useful mechanism for generating simple notifications. The advantage is that the developer can control the message contained in the notification, unlike the standard system level notifications.

Figure 9 shows mapping to the log message activity, which is setting the severity level and the output message.

Figure 9. Mapping to the Log Message activity
Mapping to the Log Message activity

The logged message displays in WMC in the System Log panel of the logs menu as an orchestration event. Figure 10 shows an example of the output of a log message activity in WMC, where the user defined the severity level as “Warning”.

Figure 10. Log message output in the WMC
Log message output in the WMC

You can use the Log Message activity to provide custom notifications or logging through WMC, tie this to the notification subsystem to automatically send emails, or raise SNMP traps.


Conclusion

This article showed how WebSphere Cast Iron provides a comprehensive solution for tracking, identifying, and searching jobs. It also showed how those features complement the monitoring and alerting capabilities of Cast Iron.

Resources

Learn

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere, Cloud computing
ArticleID=808398
ArticleTitle=An overview of monitoring and alerting features in IBM WebSphere Cast Iron Cloud Integration
publish-date=04042012