Monitoring WebSphere DataPower SOA Appliances

This article describes the fundamentals of monitoring the health and capacity of WebSphere DataPower SOA Appliances, including why to monitor and best practices for monitoring.

Share:

Steve Linn (swlinn@us.ibm.com), Consulting I/T Specialist, IBM Software Services for WebSphere, IBM

Photo of Steve LinnSteve Linn is a Consulting I/T Specialist with IBM Software Services for WebSphere. He has had an extensive career in software development and middleware. He currently specializes in WebSphere DataPower SOA Appliance administration and configuration. You can contact Steve at swlinn@us.ibm.com.



John Rasmussen (rasmussj@us.ibm.com), Senior I/T Specialist, IBM Software Services for WebSphere, IBM

Photo of John RasmussenJohn Rasmussen is a Senior IT Specialist, WebSphere DataPower and SOA, with IBM in Cambridge, MA. You can contact John at rasmussj@us.ibm.com.



24 March 2010

Also available in Chinese

Introduction

The IBM® WebSphere® DataPower® SOA Appliance (hereafter called DataPower) is a purpose-built hardware platform designed to simplify, secure, and accelerate XML, Web services, and Enterprise Service Bus deployments.

As with other network appliances, monitoring the health and capacity of DataPower appliances will ensure that they are ready to perform the functions for which they are configured. Monitoring not only notifies administrators of exceptions, it also provides trending analysis for managing the appliances and their capacity utilization over time, thus enabling the organization to maximize its return-on-investment and receive warnings of increases in network volumes and potential capacity issues.

This article describes various DataPower status inquiry methods and presents strategies and best practices for interpreting them. This article is based on DataPower Firmware Revision 3.8.0. Monitoring status providers may change with enhancements to the firmware, so you should check current firmware documentation for any additions to monitoring components.

Why monitor?

The DataPower Appliance family consists of 1U rack-mountable network devices. The latest generation devices (9235/9004 class) contain four gigabit RJ-45 Ethernet interfaces, a DB-9 Serial port, hot swappable power supplies and fan-trays, batteries, eight gigabytes of RAM, compact flash-based file system, and other components within a tamper-proof case. Optional features including internal hard drives, hardened cryptographic modules, and additional compact flash bays.

Each of these components helps ensure that the device is properly configured for the amount of network data it receives. Knowing that the devices are functioning properly ensures that they are available and ready to process this traffic. For example, if you are alerted to variations in the performance of the device’s fans, you may avoid having to take the device offline for unanticipated service. Understanding the level of network traffic and being aware of incremental changes may avoid bottlenecks as traffic increases over time.

Monitoring fundamentals on DataPower

DataPower provides a variety of information regarding general system health as well as consumption of resources and services. Physical parameters range from the temperature of CPUs, utilization of memory and file system, interface utilization, and voltage reading, among other physical values. In addition, there are more formulaic indicators, such as System Usage, which is a calculation of system capacity.

DataPower exposes these status values in a variety of ways. You can use the Web GUI or Command Line Interface (CLI) show commands to browse a list of status values. Or you can use the XML Management Interface (XMI) to send SOAP messages containing dp:get-status requests to the device, which responds with status information contained in SOAP responses. DataPower also supports the Simple Network Management Protocol (SNMP) and acts as an SNMP agent, providing status information in response to SNMP operations and in the creation of alerts via the SNMP notification mechanism.

Figure 1 shows the CPU usage as displayed within the Web GUI. It is obtained by navigating from Status Menu => System => CPU Usage. The data is displayed in a table incrementing from the latest 10 seconds through to the latest 24 hours.

Figure 1. Web GUI CPU usage display
The Java Beans view

The CLI show commands are used to display status information, and Listing 1 shows the show cpu command, which provides the same table of data shown in the Web GUI:

Listing 1. CLI Show CPU command
xi50# show cpu
                    10 sec    1 min   10 min   1 hour    1 day
cpu usage (%):        1        1        7        7        7

While the Web GUI and CLI are convenient tools to fetch status information interactively, the XMI can be programmatically integrated into more complex solutions. For example, a Java™ class could execute a dp:get-status request and perhaps perform configuration modification based on the response. The SOAP request in Listing 2 shows a dp:get-status request to fetch CPU usage status:

Listing 2 Sample get status XMI request
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
    <env:Body>
        <dp:request xmlns:dp="http://www.datapower.com/schemas/management">
            <dp:get-status class="CPUUsage"/>
        </dp:request>
    </env:Body>
</env:Envelope>

The response is returned in a SOAP payload, as shown in Listing 3 below. Again, the CPU status is returned within a subtree containing the same table of data returned by the Web GUI and CLI:

Listing 3 XMI dp:get-status response
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
    <env:Body>
        <dp:response xmlns:dp="http://www.datapower.com/schemas/management">
            <dp:timestamp>2009-09-24T11:56:22-04:00</dp:timestamp>
            <dp:status>
                <CPUUsage xmlns:env="http://www.w3.org/2003/05/soap-envelope">
                    <tenSeconds>1</tenSeconds>
                    <oneMinute>1</oneMinute>
                    <tenMinutes>1</tenMinutes>
                    <oneHour>1</oneHour>
                    <oneDay>1</oneDay>
                </CPUUsage>
            </dp:status>
        </dp:response>
    </env:Body>
</env:Envelope>

You can get a vast amount of status data using the dp:get-status request. For more information, including details of the schemas and WSDL used to customize dp:get-status and other XMI operations, see the IBM Redpaper WebSphere DataPower SOA Appliances: The XML Management Interface.

Most organizations query the health and capacity of a network device using the SNMP protocol in conjunction with tools such as those in the IBM Tivoli Monitoring (ITM) and Tivoli Composite Application Manager (ITCAM) product families. These tools use SNMP over UDP to poll an SNMP agent for device and application metrics. The management software may also receive notification alerts from the agent in response to particular events happening on the device. The DataPower appliance may be configured to act as a SNMP agent, responding to inbound polling requests and sending alerts in response to preconfigured events.

SNMP status variables are organized in hierarchies, which are described by the Management Information Base (MIB) document. Each metric that can be polled is addressed by an Object Identifier (OID). Some metrics are scalar objects describing a single data point, such as the current firmware version on the appliance. Other metrics may be tabular, such as the CPU status provided in the previous examples. When a specific OID is known, a GET OID can be used by the SNMP manager to get the specific metric. If all metrics in a specific hierarchy are desired, a Get Subtree can be used to get all values within that hierarchy. The DataPower appliance provides three Enterprise MIB documents for configuration, status, and notification. It is the status MIB that we are interested in.

While status inquiry is a straightforward endeavor, alerting is done using several DataPower objects. The device has four built-in notification alerts: authenticationFailure, linkDown, coldStart, and linkUp. Others are preconfigured, as described below. A properly configured SNMP monitor receives these traps in the event that the device restarts, its interfaces become enabled or disabled, or when a failed attempt to access the device occurs. In addition to the built-in alerts, custom alerts may be generated by subscribing to a list of error conditions or in conjunction with the logging system.

Reliance on alerts alone is not a sufficient monitoring strategy. For example, if the event causing the alert affects the device’s ability to send the message over the network, the notification may not be received at the SNMP monitor. Therefore it is prudent to combine subscription to alert messages with polling of status information, to provide a robust mechanism for communicating with monitoring tools.

How to monitor

Many status providers (or monitoring agents) are built into the DataPower firmware to fetch status data. Many providers are specific to the device. These providers (such as the environmental components, fans, temperatures, or battery health), are available within the default domain and are always enabled. Other status data (such as transaction rates for DataPower services) are segmented by application domain and may be further segmented by XML-Manager or DataPower service.

While the device-level data is automatically enabled, transaction data such as transaction rates or transaction times is usually available only when Statistics are enabled on the device. There are exceptions to this generalization -- for example, CPU status requires statistic enablement, while System Load does not. Each domain must have its individual Statistics setting enabled to provide domain-specific status.

This section shows you how to enable monitoring of DataPower from SNMP tools, and how to produce SNMP alerts from within DataPower. You'll see how Logging Target configuration can be configured to produce alerts based on system events, and how to subscribe to events such as Out of Memory or Power Supply failure to generate alerts. An example of a Power Supply failure will be used to demonstrate these principles.

SNMP settings must first be configured on the DataPower appliance. This configuration is accessible from the default domain and accessed from the left navigation menu of the DataPower Web GUI by first selecting the Administration menu and then selecting SNMP Settings under the Access heading.

This configuration consists of multiple tabs. The main tab must have the Admin State set to Enabled. Typically, the Local IP Address is set to a Host Alias defined in the default domain that maps to the Management Interface IP, which restricts SNMP polling requests to this IP and not any of the client traffic interfaces (eth0, eth1, or eth2). Figure 2 shows SNMP settings enabled on the default Local Port of 161. Outbound polling responses and traps will be sent out using any appliance interface that has the correct routing. To restrict this outbound traffic to the same IP, add a static route to the appliance's mgt0 configuration.

Figure 2. Enabling SNMP settings
Enabling SNMP settings

The DataPower MIBs can be downloaded from the appliance to be used by any SNMP management tool. The MIBs enable these tools to translate named objects such as dpStatusMemoryStatusUsage to an OID used to request the metric. All appliance status OIDs are in the drStatusMIB.txt MIB file. Figure 3 shows the Enterprise MIBs tab of the SNMP Settings screen, and the method for downloading the MIBs:

Figure 3. SNMP MIB download tab
SNMP MIB download tab

The Trap Event Subscription tab contains a list of event codes that can be sent to the management software as an alert. Examples are the codes for "Internal cooling fan has stopped" or "Power supply failure." Figure 4 below shows some of the default preloaded subscriptions. To add additional events, click Select Code. If a specific code is not shown in the list, you can add it manually. For example, adding code 0x806000e2 adds certificate monitor events to indicate when a certificate is nearing expiration. You can get these event codes from their associated log records in the default log. You can also get the event code in the Message Reference document for your firmware release.

Figure 4. SNMP Trap Event Subscription
SNMP Trap Event Subscription

The SNMPV1/V2c Communities tab defines access policies for management software using SNMP V1 and V2. The community name is used as a credential to access the SNMP data on the appliance. A common community name for read-only access is public. A DataPower domain, either the default or an application domain, can be associated with the configured community.

If application data is to be polled, specify the application domain; otherwise use the default domain. Specifying an application domain does not prevent management software from polling device-level metrics such as device load, CPU utilization, memory metrics, and environmental statistics. Additionally, it allows polling of application metrics such as transaction rates and times, MQ queue manager status, message counters, or SLM metrics.

The mode of the community should be configured as read-only for access to appliance status metrics. Finally, a remote host access of 0.0.0.0/0 lets any SNMP manager access this community. It can be restricted to a range of IPs if desired. To configure additional communities, click Add. Figure 5 shows the specification of an SNMP V1/V2c community name of public for the read-only access of application domain status within the swlinn-poc domain.

Figure 5. SNMP Community Settings specifying and application domain
SNMP Community Settings specifying and application domain

The Trap and Notification Targets tab lets you specify the IP and port of the SNMP manager that will receive SNMP alerts and notifications. The default is UDP port 162. The community name and the SNMP version (1, 2c, or 3) must be specified. If Version 3 is used, a DataPower user name is provided in the Security Name field. This user will be configured with SNMP V3 credentials. The specific events that are alerted are configured on either the SNMP Trap Event Subscription tab or on the subscription configuration of a SNMP logging target. Events preconfigured by default on the SNMP Trap Event Subscription tab are critical device-specific events, such as memory exhaustion, or hardware issues with the power supplies, battery, or fans. To configure additional notification targets, click Add. Figure 6 shows the configuration of the recipient of SNMP alerts using SNMP Version 2c with the community name of public:

Figure 6. SNMP Trap and Notification Targets
SNMP Trap and Notification Targets

Finally, the SNMPV3 Contexts tab gives SNMPV3 managers access to non-default application domains. To allow only SNMP polling, enabling the SNMP settings and providing a SNMPV1/V2c community is all that is required. Trap and notification targets and event subscriptions are required in sending event alerts to an SNMP manager.

As previously mentioned, some status data such as fan speeds and CPU utilization is specific to the device. Other status data such as transaction rates are segmented by application domain and are accumulated only if the statistics setting configuration is enabled, as shown in Figure 7 below. Enabling statistics has a very small impact on system utilization. Adjusting the Load Interval (the frequency of SNMP polling) will further limit this impact.

Figure 7. Statistics enabled per domain
Statistics enabled per domain

Here is an example of a poll of an appliance metric: An SNMP manager issues a SNMP GET command for the dpStatusMemoryStatusUsage metric, which returns a scalar value of the percentage of memory being utilized. Many SNMP managers, when configured with the DataPower MIBs, provide a tree hierarchy of the status MIB from which the appropriate metric can be selected, the metric polled, and the value displayed.

Application monitoring can also be polled if the application domain is specified in the DataPower SNMP configuration. Depending on the application configuration, specific metrics can be polled to provide data on the health or throughput of the application. These application-related table entries differ from system-level metrics in that they are dynamic and are based on the key fields of these tables. For an example of a poll of an application metric, consider the dpStatusHTTPTransactions2Table table, which contains the transaction rates for all services in a domain over various time intervals. Metrics in this table are based upon the service class, such as XMLFirewallService, and the service name, such as Loopback_FW.

In addition to the event subscriptions that you can specify in the SNMP settings, you can also configure a DataPower logging target to produce SNMP logging events, which enables DataPower to send SNMP alerts for specific events of interest. Select Manage Logging Targets from the left navigation of the DataPower Web GUI from the Administration menu under the Miscellaneous heading. Click Add to create a new logging target, and specify Target Type to be SNMP. Figure 8 shows a log target with an SNMP Target Type:

Figure 8. SNMP logging target
SNMP logging target

The SNMP logging target can subscribe to and filter events just like any other DataPower logging target. The SNMP configuration's list of trap and notification event codes specifies most critical events. An SNMP logging target in the default domain that subscribed to all events with a severity of critical or above is a similar way to produce these alerts. However, the logging target subscriptions in an application domain are more application specific. For example, you can specify logs with an MQ or SSL log category at the error or above level. You can also specify log messages generated by custom stylesheets using custom log categories. Figure 9 shows the subscription of all critical events for this SNMP type log target:

Figure 9. Logging Target Subscriptions
Logging Target Subscriptions

Now that the steps to configure and enable SNMP alerts have been described, here is a demonstration of a power supply alert. With the above configuration, the plug from one of two power supplies is pulled. Figure 10 shows log entries associated with a power supply failure:

Figure 10. System Log Entries
System Log Entries

The SNMP configuration specified no restrictions on the SNMP Managers that could receive alerts from this appliance's public community. Any SNMP manager listening for alerts from this appliance on Port 162 will receive a trap for the power failure event.

This section has shown you how to configure DataPower to enable monitoring of both appliance and application metrics from SNMP tools, and how to produce SNMP alerts from within an appliance. A logging target configuration was configured to produce alerts based on logging events. SNMP configuration was configured to produce alerts by subscribing to systems events (such as "Out of memory" or "Power supply has failed") as well as an application event (an SSL certificate expiration warning). Enabling statistics for application-level metrics was also shown. A poll of the memory metrics was shown to demonstrate monitoring of device metrics, and a poll of the transaction rate table was shown to demonstrate monitoring of application-specific metrics. Finally, an example of a power supply failure was used to demonstrate SNMP alerting.

What to monitor

Monitoring accomplishes multiple goals. The general health of the device and of its various physical components can be ascertained by environmental status information such as temperatures, fan speeds, and the status of batteries and power supplies. System load can be gauged by a special status value known as System Usage, in addition to more familiar measurements such as CPU, memory, and file system utilization. The amount of data being processed by the device can be determined by analyzing network interface consumption. The following section discusses several informative status values. Each section shows how to determine the data from the Web GUI, the element from the XMI response, the CLI command to execute to show the status, and the object from the SNMP Enterprise MIB that contains the value.

General device health and activity monitors

General health and activity monitors ensure that the DataPower device is operating within predefined system parameters. You can analyze system capacity via system load and CPU utilization. You can evaluate uptime to ensure that the device has not experienced an unexpected restart. Fans and temperatures are checked to avoid overheating, which can take a device out of service. The following monitors are involved in these tasks:

System usage
Web GUISystem => System UsageXMISystemUsage/Load
CLIShow LoadStatus MIBdpStatusSystemUsageLoad

System Usage is a measurement of the device’s ability to accept additional work. It is a formulaic calculation based on various components of system load. System Usage is typically considered the best single indicator of overall system capacity. While it may sometimes spike to 100%, typical values are less than 75%. The secondary work list value is a calculation of queued tasks, and is of lesser interest in typical monitoring situations.

Figure 11. System Usage Status
System Usage Status
CPU Usage
Web GUISystem => CPU UsageXMICPUUsage
CLIShow cpuStatus MIBdpStatusCPUUsage

CPU Usage statistics are provided over five time intervals. Many customers are accustomed to monitoring CPU utilization, but this metric in DataPower is not as reliable as System Usage in determining device capacity. DataPower is self-optimizing, and spikes in CPU unassociated with traffic levels may occur as the device performs background activities. CPU usage may sometimes spike all the way up to 100%, but this level is not necessarily a concern unless it is sustained over numerous consecutive polls.

Figure 12. CPU Usage Status
System Usage Status
Memory usage
Web GUISystem => System => Memory UsageXMIMemoryStatus
CLIShow memoryStatus MIBdpStatusMemoryStatus

Memory Usage statistics are provided for various classifications of the appliance’s flash memory. Statistics include a percentage of total memory utilized; bytes of total, used, and free memory; and of lesser interest in typical monitoring, request, XG4, and held memory. The percentage of used memory depends on the application, the size of request and response messages, and the volume and latency of requests. Typical utilization runs less than 80%, and statistics beyond this threshold are of concern. You can use the device’s Throttle Settings to temporarily slow down request processing or to perform a warm restart, which recaptures memory in this situation.

The following system error codes are associated with these sensors and can be used to trigger alerts from the SNMP Trap Event Subscription configuration:

0x01a40001 Throttling connections due to low memory
0x01a30002 Restart due to low memory 
0x01a30003 Memory usage recovered above threshold
Figure 13. Memory Usage Status
System Usage Status
File system information
Web GUISystem => System => File system InformationXMIFilesystemStatus
CLIShow FilesystemStatus MIBdpStatusFilesystemStatus

File system statistics are provided for free and total space of the encrypted, temporary, and internal file systems. Monitor all free space metrics -- levels below 20% of the total space are a concern. You can use the device’s Throttle Settings to temporarily slow down request processing or to perform a warm restart, which recaptures file system space in situations of reduced free space.

The following system error codes are associated with these sensors and can be used to trigger alerts from the SNMP Trap Event Subscription configuration:

0x01a40005 Throttling connections due to low temporary file space 
0x01a30006 Restart due to low temporary file space 
0x01a50007 Temporary file space recovered above threshold
Figure 14. File system Usage Status
System Usage Status
System up time
Web GUIMain => Date and TimeXMIDateTimeStatus/uptime
CLIDateTimeStatus/ uptimeStatus MIBdpStatusDateTimeStatusuptime

System up time indicates the elapsed time since the device was last restarted, including controlled firmware reloads as well as any unexpected device restarts. The DataPower device restarts itself automatically in conjunction with throttle configurations such as memory or file system constraints. While you can use SNMP notification for alerting, monitoring uptime via polling ensures that any notification delivery failure will not obscure these events.

Figure 15. Date and time status
System usage status
Temperature sensors
Web GUISystem => Temperature SensorsXMITemperatureSensors/{various name values}
CLIShow Sensors-TemperatureStatus MIBdpStatusTemperatureSensorsTable

Various temperature readings are available for CPUs, Memory, and System. Each has a warning and danger temperature associated with it and a status value of OK or FAIL. Monitoring the status ensures that the device is operating within the specified range. Investigate temperatures outside the ranges by checking fan speeds, airflow around device, and if necessary by contacting DataPower Support.

Figure 16. Temperature sensors status
Temperature sensors status
Fan sensors
Web GUISystem => Fan SensorsXMIEnvironmentalFanSensors/{various fan-id values}
CLIShow Sensors-FanStatus MIBdpStatusEnvironmentalFanSensorsTable

Proper functioning of the device’s fans is vital for proper operation. There are two hot swappable fan trays. If the device contains the optional hard disk drives, it will have two additional fans. Each value is associated with a minimum range and a status indicator. Monitoring the status value will ensure proper functioning of the fans. The following system error codes are associated with these sensors and can be used to trigger alerts from the SNMP Trap Event Subscription configuration:

0x02240002 Internal cooling fan has slowed
0x02220003 Internal cooling fan has stopped
Figure 17. Fan sensors status
Temperature sensors status
Other sensors
Web GUISystem => Other SensorsXMIEthernetInterfaceStatus/{various name values}
CLIShow Sensors-OtherStatus MIBdpStatusOtherSensorsTable

There are several additional sensors grouped into the Other classification, including battery, hard disk, and power supply indicators. The intrusion detection sensor is also in this list, and it is triggered when tampering of the physical device is detected. All of these variables include a status value. Monitoring the status value will ensure proper functioning of the fans and other components.

The following system error codes are associated with these sensors and can be used to trigger alerts from the SNMP Trap Event Subscription configuration:

0x02220001 Power supply failure
0x02220004 System battery missing
0x02220005 System battery failed

Replace the battery every two years -- critical level log records will begin to appear before that.

Figure 18. Other sensors status
Temperature sensors status

Interface utilization statistics

Interface utilization monitors provide an analysis of the amount of data that is being received and transmitted by the DataPower device. Each device contains four gigabit interfaces. Monitoring this utilization can help you understand your transmission rates and how they change over time. Knowing that a service is increasing 10% per month can be used to anticipate additional support resources such as DataPower or backend devices.

Ethernet interfaces
Web GUISystem => Ethernet InterfacesXMIEthernetInterfaceStatus/{various name values}
CLIShow EthernetStatus MIBdpStatusEthernetInterfaceStatusTable
Figure 19. Ethernet interface status
Ethernet interface status
Receive and transmit throughput
Web GUIIP-Network => RX ThroughputXMIReceiveKbpsThroughput/{various time values}
CLIShow receive-kbpsStatus MIBdpStatusReceiveKbpsThroughputTable
Web GUIIP-Network => TX ThroughputXMITransmitKbpsThroughput/{various time values}
CLIShow transmit-kbpsStatus MIBdpStatusTransmitKbpsThroughputTable

Receive and transmit throughput information can help you understand the amount of data being processed by the device. These statistics are provided for five time values ranging from 10 seconds up to the most recent 24 hour period. This data point is an important one to capture in order to understand the network load that is being applied to the device. It includes management traffic. If you have not segregated management traffic such as Web GUI, CLI, and XMI to a separate interface, then this data will be included with any application traffic.

Each DataPower configuration (or application if you prefer) will vary significantly in terms of the processing done on individual messages. In some instances, small messages may trigger significant processing, perhaps requesting additional data from off box endpoints, performing processor intensive cryptographic operations, or in some other way generating significant system load. In another instance, large messages may be simply routed and require less processing. While there is no hard and fast rule, over time, observations of increases in data will correspond to increases in utilization of DataPower resources. Knowing this information before bottlenecks occur and alleviating it with additional DataPower devices can help you avoid system interruptions.

Figure 20. Rx throughput status
Rx throughput status
HTTP Connections
Web GUIConnection => HTTP Connection StatisticsXMIEthernetInterfaceStatus/{various name values}
CLIShow http connectionStatus MIBHTTPConnections

HTTP connections are produced at the domain level. Statistics must be enabled for each domain that is to produce HTTP connection data. One peculiarity is that HTTP connection data is not accumulated for services in loopback mode. The status data is segmented by XML-Manager and contains information about HTTP connections, such as request and reuse. This data can help you understand the level of connections and can be used to judge utilization growth over time.

Figure 21. HTTP connections status
HTTP connections status

Transaction rates and elapsed times for individual services are accumulated at the domain and within domain service level. Transaction rate and time are not provided unless statistics are enabled for each domain. This data can help you understand the number of transactions processed and the average response time of those transactions for a particular service over a number of time intervals.

Transaction rate and time
Web GUIConnection => Transaction RateXMIHTTPTransactions /{various time values}
CLIShow httpStatus MIBdpStatusHTTPTransactionsTable
Web GUIConnection => Traction TimeXMIHTTPMeanTransactionTime/{various time values}
CLIShow httpStatus MIBdpStatusHTTPMeanTransactionTimeTable
Figure 22. Transaction rate status
Transaction rate status

Other network status providers

DataPower supports many protocols beyond the HTTP examples discussed so far, including support for FTP, IMS, MQ, NFS, NTP, SQL, Tibco, and WebSphere JMS. Each of these protocols is represented by status providers, and as in the case of the previous examples, each is supported by the Web GUI, CLI, XMI, and SNMP. Individual configurations may not use any of these additional protocols, and few will use all of them. However, in a configuration that is using one or more of these protocols, monitoring the related status provider is prudent.

Best practices

Successful monitoring of the DataPower appliance will utilize active and proactive inquiry of status information. Configuration of SNMP tools will require listening for traps sent by the device and periodic polling of the device for MIB status data. These actions require a combination of DataPower SNMP Trap Event Subscription configuration and configuration of the SNMP monitoring agent in polling and potentially based on returned status values.

In addition to device monitoring, application monitoring is also a useful practice. In this instance sample messages may be sent from robotic clients through the DataPower service to ensure that all network links (including load balancers) are operational. In some instances, this effort is extended to include sending messages through to backend service provider applications to ensure that both frontside and backside links are in service. Both DataPower and backside resources must be configured to respond appropriately to these test messages.

The DataPower SMMP trap subscription capability is a useful method of leveraging SNMP notification of events within DataPower. Here is a suggested list of error codes to subscribe to. In the event that the error is produced, the SNMP agent on DataPower will send an Alert/Trap to the SNMP monitor.

Suggested error code subscription

0x02220001environmentalcriticalPower supply failure.
0x02240002environmentalwarningInternal cooling fan has slowed
0x02220003environmentalcriticalInternal cooling fan has stopped.
0x02220004environmentalcriticalSystem battery missing.
0x02220005environmentalcriticalSystem battery failed.
0x00330002mgmterrorMemory full
0x01a40001systemwarningThrottling connections due to low memory
0x01a30002systemerrorRestart due to low memory
0x01a30003systemerrorRestart due to resource shortage timeout
0x01a50004systemnoticeMemory usage recovered above threshold
0x01a50005systemwarningThrottling connections due to low temporary file space
0x01a30006systemerrorRestart due to low temporary file space
0x01a50007systemnoticeTemporary file space recovered above threshold
0x01a40008systemwarningThrottling connections due to low number of free ports
0x01a30009systemerrorRestart due to port shortage
0x01a3000bsystemerrorRestart due to prefix qcode shortage
0x01a3000csystemerrorRestart due to namespace qcode shortage
0x01a3000dsystemerrorRestart due to local qcode shortage
0x01a2000esystemcriticalInstalled battery is nearing end of life
0x01a30011systemerrorInvalid virtual file system
0x01a30012systemerrorFile not found
0x01a30013systemerrorBuffer too small
0x01a30014systemerrorI/O error
0x01a30015systemerrorOut of memory
0x01a10016systemalertNumber of free qcodes is very low
0x01a30017systemerrorRestart due to low file descriptor
0x01a40018systemwarningThrottling due to low number of available file descriptors

MIB status values to monitor

It is recommended that SNMP monitors be configured to fetch and report on the following conditions:

dpStatusSystemUsageLoad>80% for interval of 10 minutes or more
dpStatusCPUUsagetenMinutes>90% (10 minute interval)
dpStatusFilesystemStatusFreeTemporary<20%, maybe unnecessary due to error code subscription
dpStatusFilesystemStatusFreeUnencrypted<20%, maybe unnecessary due to error code subscription
dpStatusFilesystemStatusFreeEncrypted<20%, maybe unnecessary due to error code subscription
dpStatusMemoryStatusFreeMemory<20%, maybe unnecessary due to error code subscription
dpStatusTemperatureSensorsReadingStatusVarious temperature sensor readings (table)
dpStatusEthernetInterfaceStatusStatusFor configured interfaces

MIB status values to monitor for interface utilization

In addition to polling and inquiring of data, it is important to ascertain the normal traffic patterns of applications over time. The best way to do this is to capture and monitor the amount of network traffic that the device is processing. The transmit and receive values below will help you predict when devices will become saturated with traffic. Knowing this ahead of time can help you avoid service disruptions.

dpStatusNetworkTransmitDataThroughputTenMinutesBitsCapture values over extended time
dpStatusNetworkReceiveDataThroughputTenMinutesBitsCapture values over extended time

Conclusion

Best practice monitoring of DataPower is a three-pronged activity:

  • Continuously verify the status of the DataPower environment through polling status data and subscribing to SNMP traps.
  • Monitor device utilization and capacity through analysis of system usage data and interpretation of Ethernet activity.
  • Perform complete application path verification by sending test message through the DataPower service configuration and perhaps on through to backend resources.

Performing these three actions will ensure that services are available and the DataPower appliance is performing within standard ranges of operation.

Acknowledgements

The authors wish to thank all those who participated in the development of this developerWorks article. Of special note are the contributions of Shiu-Fun Poon, Matthias Seibler, and Gaurang Shah of WebSphere DataPower Engineering, and Bill Hines of WebSphere DataPower Technical Sales.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere
ArticleID=477311
ArticleTitle=Monitoring WebSphere DataPower SOA Appliances
publish-date=03242010