 | Level: Intermediate Jim H. Frank (jhfrank@us.ibm.com), Senior Software Architect, IBM
27 Feb 2007 IBM® WebSphere® Business Monitor is a tool to monitor business performance. It collects business performance-relevant data by extracting and aggregating information in business events. The operation of WebSphere Business Monitor is controlled by monitor models. In this article you're introduced to the core elements of a monitor model by walking through an example. You construct a monitor model for a business activity monitoring (BAM) scenario "outside-to-in," starting with business requirements and adding technical detail until you have an executable model.
Introduction
The operation of IBM WebSphere Business Monitor 6.0.2 (Monitor) is controlled with monitor models. A monitor model defines both what should be monitored and how it should be monitored; in other words, it defines the business metrics to be observed and their dependencies on business events. This article introduces the basic features of a monitor model by walking through a simple banking scenario. (More advanced features will be discussed in subsequent articles in this series.) It also demonstrates how to construct a monitor model from "outside-to-in," by starting from business requirements and adding technical detail until you have a deployable model.
Bank scenario
A branch manager of a major bank wants to monitor the ATMs within her region. She needs information about each ATM's current cash level, a forecast when it must be replenished, and failure rates. She also wants to be alerted to out-of-cash situations, repeated invalid PIN entries, and user sessions taking an excessive amount of time.
The ATMs send events reporting each of the following situations: startup and shutdown, customer login and logout, cash withdrawals, and invalid PIN entries. The information carried by these events (ATM number, customer identification, and amount of cash withdrawn) must be used by Monitor to keep the business metrics up to date. During monitor model construction, you will discover whether enough information is carried by these events to satisfy all of the bank's monitoring requirements. If not, additional events will have to be created or additional data will have to be added to existing events, until all monitoring requirements can be met. This usually requires collaboration between the teams in charge of event generation and monitoring.
To construct the monitor model from the outside-in, we need to start with the business metrics and alerts that users want to observe, add subscriptions for inbound events, and finally, add the maps and triggers that complete the logic of our model.
Defining business metrics and alerts
To start the construction, list the business metrics that need to be monitored for each ATM, as shown in Table 1.
Table 1. Business metric definitions
| Name | Description |
|---|
| Current cash level | The current amount of cash in the ATM |
|---|
| Time until empty | A projected time period for which the ATM will still have cash |
|---|
| Next refill date | A suggested date by which the ATM should be replenished |
|---|
| Transactions per day | Average number of transactions per day |
|---|
| Mean time between failures | Mean time between ATM shutdowns due to device failure |
|---|
| Percent down time | Amount of time during which the ATM is not operational |
|---|
Similarly, list the alerts that should be generated, as shown in Table 2.
Table 2. Business alert definitions
| Name | Description |
|---|
| Out of cash alert | Emitted when the cash level drops below $1000 |
|---|
| Invalid PINs alert | Emitted when an invalid PIN is entered three times |
|---|
| Runaway session alert | Emitted when a customer session takes more than 30 minutes |
|---|
The formal definition of these metrics and alerts, using the Monitor model editor, results in the initial model shown in Figure 1. See Resources for more on the model editor.
Figure 1. First draft (stub) of ATM Monitor Model
(View a larger version of this image.)
The ATM monitor model in Figure 1 contains a single monitoring context (MC) definition, ATM MC. Within this monitoring context definition is the definition of an ATM key metric, which is autogenerated by the editor and definitions for six other metrics, which were added manually based on the list of business metrics in Table 1. Three outbound event definitions are added as well -- one for each of the alerts in Table 2. The Problems view shows a number of issues, all because our model is not yet complete.
A monitoring context definition (ATM MC in our example) is the main construct of a monitor model. It defines the data structure and behavior of an "observer object" dedicated to monitoring some real or abstract entity, such as an ATM, an insurance claim process, a sales performance, stock level in a warehouse, and so on. Our example contains just one monitoring context definition, which defines an observer object for an ATM.
At run time, a monitoring context (MC) definition is instantiated once per entity observed. Thus, there will be one instance of our monitoring context definition per ATM in the network. An instance is called a monitoring context because the information it collects from inbound events provides a context for monitoring the entity. In our example, the ATM's cash level, transactions per day, mean time between failures -- as well as lower level data like the ATM's serial number, startup time, error counts -- provide us with the context for detecting out-of-cash situations, preferred customer locations, need for maintenance, and so on.
Selecting the ATM.mm tab of the editor brings you to the XML version of our draft model, which enables you to see every detail at a glance, as shown in
Listing 1. The <dataMartModel> part of the model is suppressed here.
Listing 1. XML view of first draft (stub) of ATM monitor model
<mm:monitor xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:mm="http://www.ibm.com/xmlns/prod/websphere/monitoring/6.0.2/mm"
xmlns:xsd="http://www.w3.org/2001/XMLSchema-datatypes"
xsi:schemaLocation=
"http://www.ibm.com/xmlns/prod/websphere/monitoring/6.0.2/mm monitor.xsd"
displayName="ATM" id="ATM" timestamp="2007-01-17T17:05:50">
<description>monitor model to survey ATMs</description>
<monitorDetailsModel displayName="ATM" id="MDM">
<monitoringContext displayName="ATM MC" id="ATM_MC">
<description>monitoring context for an ATM</description>
<outboundEvent displayName="out of cash alert" id="out_of_cash_alert" type="">
<description>emitted when the cash level drops below $1000</description>
</outboundEvent>
<outboundEvent displayName="invalid PINs alert" id="invalid_PINs_alert" type="">
<description>emitted when an invalid PIN is entered three times</description>
</outboundEvent>
<outboundEvent displayName="runaway session alert" id="runaway_session_alert"
type="">
<description>
emitted when a customer session takes more than 30 minutes
</description>
</outboundEvent>
<metric displayName="ATM Key" id="ATM_Key" type="xsd:string" isPartOfKey="true"/>
<metric displayName="current cash level" id="current_cash_level" type="xsd:integer">
<description>the current amount of cash in the ATM</description>
</metric>
<metric displayName="time till empty" id="time_till_empty" type="xsd:duration">
<description>a projected time for which the ATM will still have cash</description>
</metric>
<metric displayName="next refill date" id="next_refill_date" type="xsd:date">
<description>a suggested date by which the ATM should be replenished</description>
</metric>
<metric displayName="mean time between failures" id="mean_time_between_failures"
type="xsd:duration">
<description>mean time between ATM shutdowns due to device failure</description>
</metric>
<metric displayName="percent down time" id="percent_down_time" type="xsd:decimal">
<description>
the fraction of time during which the ATM is not operational
</description>
</metric>
<metric displayName="transactions per day" id="transactions_per_day"
type="xsd:decimal">
<description>the average number of transactions per day</description>
</metric>
</monitoringContext>
</monitorDetailsModel>
[...]
</mm:monitor>
|
At the root of the model you see a <monitor> element that, besides some generated namespace declarations and a schema location, defines a display name, ID, and timestamp for this monitor model. A short description follows. This is the same information that is visible in Figure 1. The <monitor> element is the top-level container of several parts of a monitor model: the monitor details model, the key performance indicator (KPI) model, the data mart model, the visual model, and the event model. Only the monitor details model is shown above and will be discussed in this article. (Other parts will be covered in subsequent articles in this series.)
The <monitorDetailsModel> contains the event processing logic for Monitor, encoded in one or more <monitoringContext> definitions. In our example, there is just one MC definition (which is really just a stub so far) with three outbound event definitions, a key metric definition, and six other metric definitions. These are the nested elements within <monitoringContext> in the XML in Listing 1 and are also visible in the navigation tree in Figure 1.
A metric definition defines a typed slot or typed memory location in a monitoring context, quite like a field in a Java™ class. From an object-oriented point of view, you can consider a monitoring context definition a special-purpose class, with special behavior, whose instances are the observer objects we mentioned above.
Each element in our monitoring context definition has a type. The metric types are visible in the XML in Listing 1. The types of the outbound event definitions must still be assigned.
The primitive types we can use for metrics are the XML Schema datatypes boolean, string, integer, decimal, duration, date, time, and dateTime. The type of the key metric (string) was preset by the editor; all other types were entered manually.
Inbound and outbound event definitions use the type system of Common Event Infrastructure (CEI), which is based on the type system of Java. As events are received and emitted by Monitor, the values carried in event fields are converted to and from XML Schema datatypes. Defining event types is discussed in the next section.
 | | See Resources for more on datatypes, Common Event Infrastructure (CEI), and the Common Base Event (CBE) specification. |
|
Defining events
Assume that an ATM issues the events shown in Table 3.
Table 3. ATM events
| Name | Content | Description |
|---|
| ATM ready | ATM #, initial cash level | Signals ATM startup / reports initial cash level |
|---|
| ATM shutdown | ATM #, error code | Non-zero error code if shutdown due to failure |
|---|
| Customer login | ATM #, customer account, customer action | Customer_action field is "login" |
|---|
| Customer logout | ATM #, customer account, customer action | Customer_action field is "logout" |
|---|
| Invalid PIN | ATM #, customer account, customer action | Customer_action field is "invalid PIN" |
|---|
| Cash withdrawal | ATM #, customer account, amount withdrawn | Amount withdrawn is reported as integer number of cash units |
|---|
Each event carries the ATM device number, which is used to correlate it with its pertinent monitoring context. Each event also carries a timestamp, which is part of the predefined structure of a Common Base Event (CBE) and not shown explicitly in Table 3.
Defining an inbound event for a monitoring context requires two parts: defining the event type and defining the event subscription. The event type declares the structure of the event. The subscription defines to which monitoring contexts it should be delivered.
Let's start by adding event type definitions for our scenario to the Event Definitions folder of our monitor modeling project. We assume that customer login, customer logout, and invalid PINs are all reported by events of the same type Customer_Action, whose field customer_action indicates what the customer actually did (login / logout / entered PIN). We thus have only four different type definitions, or structure definitions, for the six kinds of ATM events we are interested in. Together with the three type definitions for outbound events, the complete list of event types should look like Figure 2.
Figure 2. Adding event type definitions
(View a larger version of this image.)
Notice an additional event type definition, WBI.MonitoringEvent, which was added by the editor and made a parent of all new event types. This is useful for scenarios where we monitor business activity based on predefined events from WebSphere Integration Developer (Integration Developer) components. Since we don't use such events in this scenario, click the Parent field and replace WBI.MonitoringEvent with event. As a result, all event types now inherit from the CBE of CEI.
Complete the event type definitions by adding attributes for each type, as shown in Table 4. We use extended data elements for all business payload fields except creationTime, which is part of the predefined attributes of a CBE and thus inherited from the parent type (event).
In a real-world scenario, you'd likely import event type definitions provided by the ATM manufacturer, while here we enter them by hand. (See "Managing event definitions" in Resources for more about importing event type definitions.)
Table 4. Event type definitions
| Event Type | Extended Data Element Name | Extended Data Element Type |
|---|
| | |
|---|
| ATM_Ready | ATM_number | string |
|---|
| initial_cash_level | int |
|---|
| | |
|---|
| ATM_Shutdown | ATM_number | string |
|---|
| error_code | int |
|---|
| | |
|---|
| Cash_Withdrawal | ATM_number | string |
|---|
| customer_account | string |
|---|
| amount_withdrawn | int |
|---|
| | |
|---|
| Customer_Action | ATM_number | string |
|---|
| customer_account | string |
|---|
| customer_action | string |
|---|
| | |
|---|
| Invalid_PINs_Alert | ATM_number | string |
|---|
| customer_account | string |
|---|
| number_of_attempts | int |
|---|
| | |
|---|
| Out_Of_Cash_Alert | ATM_number | string |
|---|
| remaining_cash_level | int |
|---|
| | |
|---|
| Runaway_Session_Alert | ATM_number | string |
|---|
| customer_account | string |
|---|
| session_active_since | dateTime |
|---|
The first four event type definitions are for inbound events from ATMs, the last three are for outbound events, which are sent as alerts by Monitor. Most of the attribute definitions are self-explanatory. The cash levels and cash amount withdrawn are reported as integers, since ATMs don't store or issue fractional currency. Editing the event type definition for Runaway_Session_Alert is shown in Figure 3.
Since the WBI.MonitoringEvent is no longer needed, it is safe to delete it from the list of event type definitions. The editor will add it back whenever it is used as the parent of a new event type.
Figure 3. Completing the event type definitions
(View a larger version of this image.)
Now it's time to add event subscriptions for the six kinds of ATM events we want to capture.
Event subscriptions serve two purposes: For the event infrastructure, they define which events should be delivered to Monitor so that it can do its job. For a monitoring context, an event subscription also defines an event entry point, that is, a typed slot, just like a metric, to which an inbound event can be delivered. After an event is delivered to an MC, its content can be accessed by referencing the event entry point as if it were a metric with a complex data structure that holds the event content. Similarly, an outbound event definition logically defines an event exit point (or event launch point) for an MC: a typed slot to which the payload of an outbound event is written before it is emitted.
Upon adding the six event subscriptions, the hierarchical view of our MC definition should look like Figure 4. Three of the seven error messages in the Problems view have disappeared as we assigned types to the outbound event definitions. (All CWMMV0000E: Event type not found messages are gone.)
Figure 4. Adding inbound event definitions (event subscriptions)
(View a larger version of this image.)
The XML representation of our event subscriptions is shown in Listing 2. Each has a display name, ID, type, and the default settings for zero / one / multiple correlation matches. However, there are no correlation predicates or filter conditions yet. These expressions, which represent a pivotal part of the event subscription logic, are discussed next.
Listing 2. XML view of Inbound Event definitions (defining event subscriptions, or entry points)
<mm:monitor xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance"
xmlns:mm=http://www.ibm.com/xmlns/prod/websphere/monitoring/6.0.2/mm"
xmlns:xsd=http://www.w3.org/2001/XMLSchema-datatypes"
xsi:schemaLocation=
"http://www.ibm.com/xmlns/prod/websphere/monitoring/6.0.2/mm monitor.xsd"
displayName="ATM" id="ATM" timestamp="2007-01-17T17:05:50">
<description>monitor model to survey ATMs</description>
<monitorDetailsModel displayName="ATM" id="MDM">
<monitoringContext displayName="ATM MC" id="ATM_MC">
<inboundEvent displayName="ATM ready" id="ATM_ready"
multipleCorrelationMatches="ignore" noCorrelationMatches="ignore"
oneCorrelationMatch="ignore" type="ATM_Ready">
<description>signals ATM startup / reports initial cash level</description>
</inboundEvent>
<inboundEvent displayName="ATM shutdown" id="ATM_shutdown"
multipleCorrelationMatches="ignore" noCorrelationMatches="ignore"
oneCorrelationMatch="ignore" type="ATM_Shutdown">
<description>non-zero error code if shutdown due to failure</description>
</inboundEvent>
<inboundEvent displayName="customer login" id="customer_login"
multipleCorrelationMatches="ignore" noCorrelationMatches="ignore"
oneCorrelationMatch="ignore" type="Customer_Action">
<description>customer_action field is "login"</description>
</inboundEvent>
<inboundEvent displayName="customer logout" id="customer_logout"
multipleCorrelationMatches="ignore" noCorrelationMatches="ignore"
oneCorrelationMatch="ignore" type="Customer_Action">
<description>customer_action field is "logout"</description>
</inboundEvent>
<inboundEvent displayName="invalid PIN" id="invalid_PIN"
multipleCorrelationMatches="ignore" noCorrelationMatches="ignore"
oneCorrelationMatch="ignore" type="Customer_Action">
<description>customer_action field is "invalid PIN"</description>
</inboundEvent>
<inboundEvent displayName="cash withdrawal" id="cash_withdrawal"
multipleCorrelationMatches="ignore" noCorrelationMatches="ignore"
oneCorrelationMatch="ignore" type="Cash_Withdrawal">
<description>
amount withdrawn is reported as an integer number of cash units
</description>
</inboundEvent>
[...]
</monitoringContext>
</monitorDetailsModel>
</mm:monitor>
|
Before adding filter conditions and correlation expressions, let's digress a bit to explain how Monitor handles inbound events at run time.
The type of a CEI event (ATM_Ready, ATM_Shutdown, and so on) that describes the variable part of its structure (the payload carried in property data and extended data elements), is indicated by its extensionName attribute. Processing an inbound event in Monitor involves the following steps:
- The type of the incoming event (the value of its
extensionName attribute) is compared with the types of all event subscriptions in the monitor model. Only subscriptions with a matching type are considered further.
- For each subscription with a matching type, its filter condition is evaluated based on the incoming event's content. If the filter evaluates to
false, then the event subscription is dropped from further consideration. A filter can only depend on event content, not on MC state. It effectively narrows down the set of events of a given type (structure) that qualify for this event subscription. We look at examples of this shortly.
- If the type check and filter evaluation are both successful, then the subscription's correlation expression is evaluated for each existing MC (for each instance of the MC definition). The goal is to correlate an incoming event with the monitoring contexts to which it should be delivered. The correlation predicate may evaluate to true for zero, one, or multiple instances.
- Depending on the number of matching instances that were found, the
noCorrelationMatches, oneCorrelationMatch, or multipleCorrelationMatches attribute determines what to do with the event: ignore, deliver to the matching monitoring contexts, create a new monitoring context and deliver the event to that, or treat it as an error.
The filter and correlation settings for our six event subscriptions are summarized in Table 5.
Table 5. Inbound event filtering and correlation
| Inbound Event Definition: Event Type | Filter Condition | Correlation Expression | No correlation matches | One correlation match | Multiple correlation matches |
|---|
| ATM ready: ATM_Ready | - | ATM_ready/extendedData/
ATM_number eq ATM_Key | create new context | deliver event | raise exception |
|---|
| ATM shutdown: ATM_Shutdown | - | ATM_shutdown/extendedData/ ATM_number eq ATM_Key | ignore | deliver event | raise exception |
|---|
| customer login: Customer_Action | customer_login/extendedData/ customer_action eq "login" | customer_login/extendedData/ ATM_number eq ATM_Key | ignore | deliver event | raise exception |
|---|
| customer logout: Customer_Action | customer_logout/extendedData/ customer_action eq "logout" | customer_logout/extendedData/ ATM_number eq ATM_Key | ignore | deliver event | raise exception |
|---|
| invalid PIN: Customer_Action | invalid_PIN/extendedData/ customer_action eq "invalid PIN" | invalid_PIN/extendedData/ ATM_number eq ATM_Key | ignore | deliver event | raise exception |
|---|
| cash withdrawal: Cash_Withdrawal | - | cash_withdrawal/extendedData/ ATM_number eq ATM_Key | ignore | deliver event | raise exception |
|---|
The filter conditions for customer login, customer logout, and invalid PIN provide the differentiation required in addition to the event type (which for all three is Customer_Action), to distinguish the three kinds of events. The three other subscriptions have no filter: any event with a matching type is considered for delivery to this event entry point.
The correlation expression is the same for all six event subscriptions and compares the ATM number in the event with the monitoring context key (the ATM key metric). The no-correlation matches setting for ATM ready events says that a new MC will be created if none with a matching ATM number is found. (You must then initialize the ATM key metric from the event, so that subsequent ATM ready events from the same ATM will correlate with this monitoring context. This is discussed in Completing the logic.) For all other event subscriptions, the no-correlation matches setting says that the event should be ignored if an MC for the ATM does not exist. The settings for one and multiple correlation matches are the same for all six subscriptions and have the event delivered when there is one matching MC while error is raised when there are multiple matches. Multiple matches would mean that two or more MCs have the same ATM key value, which clearly is an error.
Adding the filter condition, correlation expression, and correlation-matches settings for the customer login event subscription is shown in Figure 5.
Figure 5. Adding filter condition and correlation criteria for customer login events
(View a larger version of this image.)
In the XML view, the filter condition and correlation criteria will look like Listing 3. Only the customer login event is shown as an example.
Listing 3. XML view of completed subscription for customer login events
<mm:monitor xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:mm="http://www.ibm.com/xmlns/prod/websphere/monitoring/6.0.2/mm"
xmlns:xsd="http://www.w3.org/2001/XMLSchema-datatypes"
xsi:schemaLocation=
"http://www.ibm.com/xmlns/prod/websphere/monitoring/6.0.2/mm monitor.xsd"
displayName="ATM" id="ATM" timestamp="2007-01-17T17:05:50">
<description>monitor model to survey ATMs</description>
<monitorDetailsModel displayName="ATM" id="MDM">
<monitoringContext displayName="ATM MC" id="ATM_MC">
<description>monitoring context for an ATM</description>
<inboundEvent displayName="customer login" id="customer_login"
multipleCorrelationMatches="raiseException" noCorrelationMatches="ignore"
oneCorrelationMatch="deliverEvent" type="Customer_Action">
<description>customer_action field is "login"</description>
<correlationPredicate
expression="customer_login/extendedData/ATM_number eq ATM_Key"/>
<filter expression='customer_login/extendedData/customer_action eq "login"'/>
</inboundEvent>
[...]
</monitoringContext>
</monitorDetailsModel>
</mm:monitor>
|
Our monitoring context definition for ATM survey now has all business metrics, event subscriptions, and event emissions defined -- all its input and output -- but the interconnecting logic is still missing. We have not defined, for example, how the current cash level metric should be calculated based on incoming events or how the outbound out-of-cash event is populated and when it is emitted.
Figure 6 shows the current state of our model. We have six event entry points (the input to the MC) shown on the left, and seven metrics and three event exit points (the output) shown on the right. To complete the MC logic, in the next section you will add maps, triggers, auxiliary metrics, counters, and timers (stopwatches), which define how data flows from left to right in this picture.
Figure 6. Current monitoring context definition stub
(View a larger version of this image.)
Completing the logic
Metrics in a monitoring context are set using maps, which work much like formulas in a spread sheet. When an input to a map changes, it is reevaluated and its target metric is updated. For example, we will define two maps for the current cash level metric: one to initiate it from the initial_cash_level attribute of an ATM ready event, and one to update it based on the amount_withdrawn attribute of a cash withdrawal event. Each map is run when the corresponding event is delivered to a monitoring context surveying an ATM and updates its state with the ATM's new cash level.
Figure 7 shows the two map definitions in the monitor model editor.
Figure 7. Adding maps for the current cash level metric
(View a larger version of this image.)
The first map expression (ATM_ready/extendedData/initial_cash_level) is evaluated when an ATM ready event arrives and initializes the metric. The second (current_cash_level - cash_withdrawal/extendedData/amount_withdrawn) is evaluated when a cash withdrawal event arrives and reduces the cash level by the amount withdrawn. These dependencies are shown graphically in the Monitoring Flow view at the bottom of Figure 7.
In general, a map depending on an event entry point will run within a monitoring context when the corresponding event is delivered to this MC. A map may depend on one event entry point at most.
The map expression ATM_ready/extendedData/ATM_number, which defines the value of the ATM key metric, is added to our model in the same way. It sets the monitoring context key based on the ATM number field in an ATM ready event. Repeated ATM ready events overwrite the key with the same value, which is fine.
In entering these expressions it is highly recommend that you use the content assist function of the editor (press Ctrl+Space in the Expression dialog pop-up). It will, for example, restrict inbound events you can use in maps to key metrics, to those instantiating the MC, show available functions and operators, facilitate writing path expressions such as ATM_ready/extendedData/ATM_number, and offer assistance in many other ways.
Our maps for ATM key and current cash level are summarized in Table 6.
Table 6. Map expressions
| Purpose | Target | Expression | Evaluated when |
|---|
| Map |
ATM Key metric | ATM_ready/extendedData/ATM_number |
ATM ready event arrives |
|---|
| Map | current cash level metric | ATM_ready/extendedData/initial_cash_level |
ATM ready event arrives |
|---|
| Map |
current cash level metric | current_cash_level - cash_withdrawal/extendedData/amount_withdrawn |
cash withdrawal event arrives |
|---|
After saving the model, the Problems view shows that the following error message for the ATM key metric has disappeared:
CWMMV0108E: This key metric does not receive a value when
/ATM/MDM/ATM_MC/ATM_ready creates this context |
Calculating the mean time between failures metric requires some additional model elements, shown pictorially in Figure 8 and in an editor view in Figure 9.
Figure 8. Update logic for mean time between failures metric
(View a larger version of this image.)
Figure 8 shows three new kinds of model elements:
- A timer, or stopwatch, definition -- time operational
- A trigger definition -- shutdown due to failure
- A counter definition -- failure count
Figure 8 also shows the control flow between these elements at run time with solid arrows and a data dependency of mean time between failures on the stopwatch with dashed arrows. The control flow is the same as shown in the Monitoring Flow view of the editor.
The stopwatch is started by ATM ready events and stopped (but not reset) by ATM shutdown events. As a result, it shows the time during which the ATM has been up and running.
A trigger acts like a conditional in the control flow; it is evaluated at certain points in time, and if its condition is true, it fires. Only in that case control flow continues from the trigger. As a result, an outbound event may be sent, a timer may be started or stopped, a counter may be incremented, and so on. In our example, the shutdown due to failure trigger is evaluated each time an ATM shutdown event is delivered to the monitoring context, and it fires if the error code of the event has a non-zero error_code (which indicates shutdown due to failure). If so, the failure count counter is incremented.
In general, a trigger definition has two logical parts:
-
When-part
- Indicates when the trigger is evaluated. This can be due to inbound events, metric updates, counter updates, other triggers firing, periodic points in time (for example, every 15 minutes), or any combination of these. The trigger evaluation criteria are entered in the Trigger Sources section of the editor.
-
If-part
- Determines whether the trigger fires. The if-part is given by a Boolean expression, which is entered in the Trigger Condition field.
A trigger can have six kinds of effects:
- Control a timer (stopwatch): start / stop /reset
- Control a counter: increment / decrement / set-to-zero
- Trigger a map execution
- Trigger an outbound event emission
- Trigger the evaluation of another trigger
- Trigger the termination of the monitoring context
Except for MC termination, these effects are defined as part of the affected model element. For example, a trigger that causes an outbound event to be sent is referenced by the outbound event definition, a trigger that increments a counter is referenced as part of the counter definition, and so forth.
There can be scenarios, in particular when triggers are evaluated periodically, where the trigger condition remains true for several evaluations but repeated firing is not desired. If so, the Trigger is repeatable field (see Figure 9) should be unchecked. The trigger will then fire only once when its condition becomes true first, and repeated true evaluations will not lead to additional signals. Only when the condition is found to be false again is the trigger considered rearmed and will fire when the next true condition is encountered.
In the scenario in Figure 8, we want the trigger to fire for each ATM shutdown due to failure, even if several such shutdowns occur in sequence. This means the Trigger is repeatable field must be checked. The editor view of our trigger definition is shown in Figure 9.
As shown in Figure 8, each time the shutdown due to failure trigger fires, the failure count counter is incremented. That is achieved by adding the trigger to its list of Counter Controls and specifying Add One as the resulting action.
The map that updates the mean time between failures metric then divides the ATM's total up-time by the number of failures counted (the map expression is shown in Figure 8 as well). In XPath 2.0 division is denoted by the div operator, not by a forward slash, and an if expression was added to prevent division by zero. In accordance with the spreadsheet paradigm, the map runs each time the counter changes. The arrow from the timer to the map is dashed, indicating a pure data dependency: The timer is read when the map is evaluated, but does not drive the map, which would be impractical; since the value of a running timer changes at every instant, a downstream map would have to run continuously.
To get more frequent updates of the mean time between failures metric (for example, every 24 hours), you could control the map by a periodic trigger with no gating condition. Control by a trigger overrides the spreadsheet-like default behavior of maps, so the metric would be recalculated every 24 hours even when there have not been any ATM failures. See examples of this technique at the end of Calculating the remaining business metrics.
Figure 9. Monitoring context definition after adding a trigger, counter, and stopwatch
(View a larger version of this image.)
The XML representation of the trigger, counter, and timer (stopwatch) definitions, and the mean time between failures metric, are shown in Listing 4.
Listing 4. Trigger, counter, and stopwatch added to control the mean time between failures metric
<mm:monitor xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:mm="http://www.ibm.com/xmlns/prod/websphere/monitoring/6.0.2/mm"
xmlns:xsd="http://www.w3.org/2001/XMLSchema-datatypes"
xsi:schemaLocation=
"http://www.ibm.com/xmlns/prod/websphere/monitoring/6.0.2/mm monitor.xsd"
displayName="ATM" id="ATM" timestamp="2007-01-17T17:05:50">
<description>monitor model to survey ATMs</description>
<monitorDetailsModel displayName="ATM" id="MDM">
<monitoringContext displayName="ATM MC" id="ATM_MC">
<description>monitoring context for an ATM</description>
<trigger displayName="shutdown due to failure" id="shutdown_due_to_failure"
isRepeatable="true">
<description>indicating shutdown due to failure</description>
<onEvent ref="ATM_shutdown" />
<gatingCondition expression="ATM_shutdown/extendedData/error_code ne 0"/>
</trigger>
<counter displayName="failure count" id="failure_count" type="xsd:integer">
<description>number of shutdowns due to failure</description>
<incrementedWhen ref="shutdown_due_to_failure"/>
</counter>
<stopwatch displayName="time operational" id="time_operational" type="xsd:duration">
<description>the total time for which this ATM has been operational</description>
<startedWhen ref="ATM_ready"/>
<stoppedWhen ref="ATM_shutdown"/>
</stopwatch>
[...]
<metric displayName="mean time between failures" id="mean_time_between_failures"
type="xsd:duration">
<description>mean time between ATM shutdowns due to device failure</description>
<map>
<outputValue>
<singleValue expression="if (failure_count gt 0)
then time_operational div failure_count else duration('PT0S')"/>
</outputValue>
</map>
</metric>
[...]
</monitoringContext>
</monitorDetailsModel>
[...]
</mm:monitor>
|
All new expressions are summarized in Table 7.
Table 7. Expressions
| Purpose | Target | Expression | Evaluated when |
|---|
| Map | mean time between failures metric | if (failure_count gt 0) then time_operational div failure_count else duration('PT0S') | failure count is updated |
|---|
| Trigger condition | shutdown due to failure trigger | ATM_shutdown/extendedData/error_code ne 0 | ATM shutdown event arrives |
|---|
Calculating the remaining business metrics
If you have followed the examples in the preceding section, you already know all elements of the monitor programming model and understand their basic behavior. In this section, you reapply the same elements and techniques to calculate the remaining business metrics, and populate and emit the outbound events. In doing this, you will learn new ways of combining the basic building blocks introduced in the preceding section in order to tackle more complex monitoring tasks.
The next two metrics we will focus on are:
- Percent down time
- Defined as one minus the ATM's fraction of up time, which is the quotient of its
time operational (as measured by the stopwatch) and the elapsed time since it first came online
- Transactions per day
- Defined as the quotient of the total number of customer logins and the time the ATM has been operational, measured in days
The logic flow for both metrics is shown in Figure 10.
A periodic trigger every 24 hours drives two maps, which update these metrics once per day. Strictly speaking, the trigger is only required for the percent down time metric, which otherwise would not get refreshed. But updating the transactions per day metric every 24 hours, rather than every time a user logs in, seemed to make sense as well.
A small detail, but note that we initialize the first time up metric with Monitor's local time rather than with the ATM ready event timestamp. This is to avoid any phantom down time due to network delay of that event.
Figure 10. Update logic for transactions per day and percent down time
(View a larger version of this image.)
Table 8 summarizes the map expressions in
Figure 10 and indicates when they are evaluated.
Table 8. Map expressions
| Purpose | Target | Expression | Evaluated when |
|---|
| Map | first time up metric | if (empty(first_time_up) and exists(ATM_ready/predefinedData/creationTime)) then current-dateTime() else first_time_up | ATM ready event arrives |
|---|
| Map | transactions per day metric | session_count div (decimal(current-dateTime() - first_time_up) div 86400) | every 24 hours trigger fires |
|---|
| Map | percent down time metric | (1 - (time_operational div (current-dateTime() - first_time_up))) * 100 | every 24 hours trigger fires |
|---|
Figure 11 shows a navigator view of our monitoring context definition after adding these new elements and a detailed view of the mean time between failures metric.
Figure 11. Monitoring context definition after adding elements
(View a larger version of this image.)
Listing 5 shows the XML version of the new logic in Figure 10.
Listing 5. Update logic for transactions per day and percent down time
<mm:monitor xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:mm="http://www.ibm.com/xmlns/prod/websphere/monitoring/6.0.2/mm"
xmlns:xsd="http://www.w3.org/2001/XMLSchema-datatypes"
xsi:schemaLocation=
"http://www.ibm.com/xmlns/prod/websphere/monitoring/6.0.2/mm monitor.xsd"
displayName="ATM" id="ATM" timestamp="2007-01-17T17:05:50">
<description>monitor model to survey ATMs</description>
<monitorDetailsModel displayName="ATM" id="MDM">
<monitoringContext displayName="ATM MC" id="ATM_MC">
<description>monitoring context for an ATM</description>
<trigger displayName="every 24 hours" id="every_24_hours" isRepeatable="true">
<description>periodic trigger, firing every 24 hours</description>
<evaluationTime minutes="0" days="1" hours="0"/>
</trigger>
<inboundEvent displayName="ATM ready" id="ATM_ready"
multipleCorrelationMatches="raiseException"
noCorrelationMatches="createNewContext"
oneCorrelationMatch="deliverEvent" type="ATM_Ready">
<description>signals ATM startup / reports initial cash level</description>
<correlationPredicate expression="ATM_ready/extendedData/ATM_number eq ATM_Key"/>
</inboundEvent>
<inboundEvent displayName="ATM shutdown" id="ATM_shutdown"
multipleCorrelationMatches="raiseException" noCorrelationMatches="ignore"
oneCorrelationMatch="deliverEvent" type="ATM_Shutdown">
<description>non-zero error code if shutdown due to failure</description>
<correlationPredicate
expression="ATM_shutdown/extendedData/ATM_number eq ATM_Key"/>
</inboundEvent>
<inboundEvent displayName="customer login" id="customer_login"
multipleCorrelationMatches="raiseException" noCorrelationMatches="ignore"
oneCorrelationMatch="deliverEvent" type="Customer_Action">
<description>customer_action field is "login"</description>
<correlationPredicate
expression="customer_login/extendedData/ATM_number eq ATM_Key"/>
<filter expression='customer_login/extendedData/customer_action eq "login"'/>
</inboundEvent>
<metric displayName="percent down time" id="percent_down_time" type="xsd:decimal">
<description>
the fraction of time during which the ATM is not operational
</description>
<map>
<trigger ref="every_24_hours"/>
<outputValue>
<singleValue expression="(1 - (time_operational div
(current-dateTime() - first_time_up))) * 100"/>
</outputValue>
</map>
</metric>
<metric displayName="transactions per day" id="transactions_per_day"
type="xsd:decimal">
<description>the average number of transactions per day</description>
<map>
<trigger ref="every_24_hours"/>
<outputValue>
<singleValue expression="session_count div
(decimal(current-dateTime() - first_time_up) div 86400)"/>
</outputValue>
</map>
</metric>
<metric displayName="first time up" id="first_time_up" type="xsd:dateTime">
<description>
time of arrival of the first "ATM Ready" event
</description>
<map>
<outputValue>
<singleValue expression="if (empty(first_time_up) and
exists(ATM_ready/predefinedData/creationTime))
then current-dateTime() else first_time_up"/>
</outputValue>
</map>
</metric>
<counter displayName="session count" id="session_count" type="xsd:integer">
<description>
number of user sessions ("customer login" events)
</description>
<incrementedWhen ref="customer_login"/>
</counter>
<stopwatch displayName="time operational" id="time_operational" type="xsd:duration">
<description>the total time for which this ATM has been operational</description>
<startedWhen ref="ATM_ready"/>
<stoppedWhen ref="ATM_shutdown"/>
</stopwatch>
[...]
</monitoringContext>
</monitorDetailsModel>
[...]
</mm:monitor>
|
Before moving on, let's critique the calculation of the percent down time and transactions per day metrics we just discussed. We are calculating lifetime averages for these metrics, which are of limited practical use, since they make it hard to see trends. If the usage of a particular ATM suddenly rises after several years of infrequent usage, the transactions per day metric will barely change. Similarly, if outages suddenly rise for an ATM that used to operate reliably, the percent down time metric will hardly budge. To see such trends, averages should be taken over a rolling time period (for example, the last 30 days). Defining such rolling averages will be discussed in a future article.
The remaining two business metrics are future estimates obtained by extrapolation of ATM usage:
- Next refill date
- Defined as the projected date when the ATM cash level will drop below $10,000.
- Time till empty
- Defined as the projected time left before an out-of-cash situation will occur.
Both metrics will be calculated using linear regression, as shown in Figure 12. Since Monitor knows the initial cash level and the time and amount of each cash withdrawal, it can use these data points to calculate the ATM's depletion rate. Linear extrapolation into the future gives us a point in time when the ATM would be empty, and the duration between now and that future point in time is the value of the time till empty metric. The date on which the extrapolated cash level (dashed line) drops below $10,000 determines the next refill date (not shown in Figure 12).
Figure 12. Linear extrapolation of ATM cash depletion
(View a larger version of this image.)
We define a number of auxiliary metrics to support our linear regression calculation. To distinguish them easily from the business metrics, the names are prefixed by aux.
Table 9. Auxiliary metrics
| Name | Description |
|---|
| aux sum x | sum(x) for cash-level linear regression |
|---|
| aux sum y | sum(y) for cash-level linear regression |
|---|
| aux sum xx | sum(x^2) for cash-level linear regression |
|---|
| aux sum xy | sum(xy) for cash-level linear regression |
|---|
| aux slope | slope of cash depletion line (in cash units per second) |
|---|
| aux y intercept | y-intercept of cash depletion line (in cash units) |
|---|
| aux last refilled | timestamp of last "ATM ready" event (time of last refill) |
|---|
All metrics except the last are of type decimal. Points in time (x values in Figure 12) are expressed as a decimal number of seconds since the time the ATM was last refilled. All new maps are summarized in Table 10.
Table 10. Map expressions
| Purpose | Target | Expression | Evaluated when |
|---|
| Map | aux last refilled metric | ATM_ready/predefinedData/creationTime | ATM ready event arrives |
|---|
| Map | aux sum x metric | 0 | aux t0 trigger fires |
|---|
| Map | aux sum x metric | aux_sum_x + decimal(cash_withdrawal/predefinedData/creationTime - aux_last_refilled) | cash_withdrawal event arrives |
|---|
| Map | aux sum xx metric | 0 | aux t0 trigger fires |
|---|
| Map | aux sum xx metric | aux_sum_xx + decimal(cash_withdrawal/predefinedData/creationTime - aux_last_refilled) * decimal(cash_withdrawal/predefinedData/creationTime - aux_last_refilled) | cash_withdrawal event arrives |
|---|
| Map | aux sum xy metric | 0 | aux t0 trigger fires |
|---|
| Map | aux sum xy metric | aux_sum_xy + decimal(cash_withdrawal/predefinedData/creationTime - aux_last_refilled) * current_cash_level | aux t1 trigger fires |
|---|
| Map | aux sum y metric | ATM_ready/extendedData/initial_cash_level | ATM ready event arrives |
|---|
| Map | aux sum y metric | aux_sum_y + current_cash_level | aux t1 trigger fires |
|---|
| Map | aux slope metric | ((withdrawal_count + 1) * aux_sum_xy - aux_sum_x * aux_sum_y) div ((withdrawal_count + 1) * aux_sum_xx - aux_sum_x * aux_sum_x) | aux t2 trigger fires |
|---|
| Map | aux y intercept metric | (aux_sum_y - aux_slope * aux_sum_x) div (withdrawal_count + 1) | aux t3 trigger fires |
|---|
| Map | next refill date metric | current-date() + duration('P30D') | aux t0 trigger fires |
|---|
| Map | next refill date metric | date(aux_last_refilled + duration((10000 - aux_y_intercept) div aux_slope)) | aux t4 trigger fires |
|---|
| Map | time till empty metric | duration('P30D') | aux t0 trigger fires |
|---|
| Map | time till empty metric | duration(- aux_y_intercept div aux_slope - decimal(current-dateTime() - aux_last_refilled)) | aux t4 trigger fires |
|---|
The resulting logic is shown in Figure 13 and Figure 14.
Figure 13. Reset logic for next refill date and time till empty metrics
(View a larger version of this image.)
When an ATM ready event arrives, which reports the ATM's initial cash level, the:
-
current cash level and aux last refilled metrics are set from the event
-
withdrawal count counter is reset to zero
-
aux sum... metrics are initialized
-
next refill date is set to 30 days from today
-
time till empty is set to 30 days
The aux t0 trigger is needed to drive the maps assigning constants, while maps that depend on event attributes will run automatically when an ATM ready event arrives. (We could have avoided the trigger by introducing an artificial dependency on the event in the assignment expressions, for example by assigning 0 * some-event-field to aux sum x, aux sum xx, and so on.)
The monitoring flow in Figure 14 looks more complex, but it really just implements the linear regression formulas in Figure 12. A chain of triggers aux t1 ... aux t4 are used to enforce the sequential execution of our maps in the order {1,2,3} - {4,5} - {6} - {7} - {8,9}. The data-driven evaluation that has worked well for all MC logic so far would lead to incorrect results in this case. For example, the aux sum y and aux sum xy metrics must only be updated after the current cash level metric has been adjusted.
Figure 14. Update logic for next refill date and time till empty metrics
(View a larger version of this image.)
An editor view of the new model elements is shown in Figure 15. The XML representation of these elements and the associated logic are shown in the complete model (Listing 6) at the end of this article and are available as a zip file you can download now.
Figure 15. Monitoring context definition with logic for next refill date and time till empty metrics
(View a larger version of this image.)
We are left with populating and emitting the three outbound events representing business alerts. Outbound event emission is always controlled by a trigger, which also drives the map populating its content. The triggers and maps for our three alerts are summarized in Table 11 and shown graphically in Figure 16.
Figure 16. Logic to set and emit the outbound events
(View a larger version of this image.)
We have introduced a new counter (invalid PIN count), a stopwatch time since login), and a string typed metric (aux customer account). The six maps in Figure 16 are numbered, and the numbers are referenced in Table 11. The three outbound event triggers are not repeating, which means that their trigger condition must evaluate to false before they are considered rearmed and can cause the emission of another event. Specifically, cash must be refilled before another out of cash alert can occur; the invalid PIN count must be set to zero by a customer login event before another invalid PINs alert can occur; and the time since login timer must be reset by a customer logout event before another runaway session alert can occur.
Table 11. Map expressions
| Purpose | Target | Expression | Evaluated when |
|---|
| Map | current cash level metric | (1) ATM_ready/extendedData/initial_cash_level | ATM ready event arrives |
|---|
| Map | current cash level metric | (2) current_cash_level - cash_withdrawal/extendedData/amount_withdrawn | cash_withdrawal event arrives |
|---|
| Map | aux customer account metric | (3) customer_login/extendedData/customer_account | customer login event arrives |
|---|
| Map | out of cash alert outbound event | (4) out_of_cash_alert/extendedData/ATM_number <-- ATM_Key; out_of_cash_alert/extendedData/ remaining_cash_level <-- current_cash_level; | out of cash trigger fires |
|---|
| Map | invalid PINs alert outbound event | (5) invalid_PINs_alert/extendedData/ATM_number <-- ATM_Key; invalid_PINs_alert/extendedData/customer_account <-- aux_customer_account; invalid_PINs_alert/extendedData/number_of_attempts <-- invalid_PIN_count; | invalid PINs trigger fires |
|---|
| Map | runaway session alert outbound event | (6) runaway_session_alert/extendedData/ATM_number <-- ATM_Key; runaway_session_alert/extendedData/customer_account <-- aux_customer_account; runaway_session_alert/extendedData/number_of_attempts <-- current-dateTime() - time_since_login; | runaway session trigger fires |
|---|
| Trigger condition | out of cash trigger | current_cash_level lt 1000 (trigger is not repeating) | current cash level metric changes |
|---|
| Trigger condition | invalid PINs trigger | invalid_PIN_count ge 3 (trigger is not repeating) | invalid PIN count counter changes |
|---|
| Trigger condition | runaway session trigger | time_since_login gt duration('PT30M') (trigger is not repeating) | every 5 minutes |
|---|
The logic in Figure 16 is straightforward. Because it provides a good summary of the behavior of all model elements, let's walk through it step by step:
- When an
ATM ready event arrives, the map (1) is driven and initializes the current cash level metric. As a result, the out of cash trigger is evaluated, but will not fire (assuming the initial cash level is greater than 1000). The control flow thus ends at this point.
- When a
cash_withdrawal event arrives, the map (2) is driven and updates the current cash level metric. The out of cash trigger is evaluated and fires if the resulting cash level is below 1000 and this is the first time this situation was encountered (the last evaluation must have been false). If so, then map (4) is driven; it populates the out of cash alert event exit point with the ATM key and the current cash level before the event is emitted.
- When a
customer login event arrives, the invalid PIN count counter and time since login timer are reset. Map (3) is driven as well and sets the aux customer account metric to the account number of the customer account for this session. The invalid PINs trigger is evaluated after the counter was set to zero, but does not fire.
- When an
invalid PIN event arrives, the invalid PIN count is incremented, the invalid PINs trigger is evaluated, and when it reaches 3, the trigger fires. It will not fire repeatedly, however, if additional invalid PIN entries are reported by the ATM. The trigger drives map (5), which populates the invalid PINs alert event exit point with the ATM key, the customer account number, and the number of invalid PIN attempts (always three with the current trigger settings). The event is then emitted.
- When a
customer logout event arrives, the time since login timer is stopped and reset. Since control flow does not continue from a timer, no further effects occur.
- Every five minutes the
runaway session trigger is evaluated to see if the time since login timer has exceeded 30 minutes of duration. Triggers that test the elapsed time of a stopwatch usually must be time controlled, since they should detect a "nothing happens" situation. They often must fire when no events arrive at all, and thus the wall-clock must be used to cause their periodic evaluation. If so, map (6) is driven, which populates the runaway session alert with the ATM key, the customer account number, and the time when this runaway session started.
 | |
A navigator view of the completed monitoring context definition is shown in Figure 17, which also shows the detailed definition of the invalid PINs alert.
Figure 17. Completed ATM MC monitoring context definition
(View a larger version of this image.)
This concludes the discussion of our monitoring context for ATM surveillance.
Listing 6a and Listing 6bshow the complete XML listing of the monitoring context definition. Or you can also download this code in a zip file.
Listing 6a. Completed monitoring context definition for ATM surveillance
<mm:monitor xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:mm="http://www.ibm.com/xmlns/prod/websphere/monitoring/6.0.2/mm"
xmlns:xsd="http://www.w3.org/2001/XMLSchema-datatypes"
xsi:schemaLocation=
"http://www.ibm.com/xmlns/prod/websphere/monitoring/6.0.2/mm monitor.xsd"
displayName="ATM" id="ATM" timestamp="2007-01-17T17:05:50">
<description>monitor model to survey ATMs</description>
<monitorDetailsModel displayName="ATM" id="MDM">
<monitoringContext displayName="ATM MC" id="ATM_MC">
<description>monitoring context for an ATM</description>
<trigger displayName="shutdown due to failure" id="shutdown_due_to_failure"
isRepeatable="true">
<description>indicating shutdown due to failure</description>
<onEvent ref="ATM_shutdown"/>
<gatingCondition expression="ATM_shutdown/extendedData/error_code ne 0"/>
</trigger>
<trigger displayName="every 24 hours" id="every_24_hours" isRepeatable="true">
<description>periodic trigger, firing every 24 hours</description>
<evaluationTime minutes="0" days="1" hours="0"/>
</trigger>
<trigger displayName="aux t0" id="aux_t0" isRepeatable="true">
<description>
auxiliary trigger, chained to "ATM Ready" event
</description>
<onEvent ref="ATM_ready"/>
</trigger>
<trigger displayName="aux t1" id="aux_t1" isRepeatable="true">
<description>
auxiliary trigger, chained to "cash withdrawal" event
</description>
<onEvent ref="cash_withdrawal"/>
</trigger>
<trigger displayName="aux t2" id="aux_t2" isRepeatable="true">
<description>auxiliary trigger, chained to "aux t1"</description>
<onTrigger ref="aux_t1"/>
</trigger>
<trigger displayName="aux t3" id="aux_t3" isRepeatable="true">
<description>auxiliary trigger, chained to "aux t2"</description>
<onTrigger ref="aux_t2"/>
</trigger>
<trigger displayName="aux t4" id="aux_t4" isRepeatable="true">
<description>auxiliary trigger, chained to "aux t3"</description>
<onTrigger ref="aux_t3"/>
</trigger>
<trigger displayName="invalid PINs" id="invalid_PINs">
<description>detecting three successive invalid PIN entries</description>
<onValueChange ref="invalid_PIN_count"/>
<gatingCondition expression="invalid_PIN_count ge 3"/>
</trigger>
<trigger displayName="out of cash" id="out_of_cash">
<description>
detecting out-of-cash situation (current cash level below 1000)
</description>
<onValueChange ref="current_cash_level"/>
<gatingCondition expression="current_cash_level lt 1000"/>
</trigger>
<trigger displayName="runaway session" id="runaway_session">
<description>
detecting a runaway user session (time since login exceeds 30 minutes)
</description>
<evaluationTime minutes="5" days="0" hours="0"/>
<gatingCondition expression="time_since_login gt duration('PT30M')"/>
</trigger>
<inboundEvent displayName="ATM ready" id="ATM_ready"
multipleCorrelationMatches="raiseException"
noCorrelationMatches="createNewContext"
oneCorrelationMatch="deliverEvent" type="ATM_Ready">
<description>signals ATM startup / reports initial cash level</description>
<correlationPredicate expression=
"ATM_ready/extendedData/ATM_number eq ATM_Key"/>
</inboundEvent>
<inboundEvent displayName="ATM shutdown" id="ATM_shutdown"
multipleCorrelationMatches="raiseException" noCorrelationMatches="ignore"
oneCorrelationMatch="deliverEvent" type="ATM_Shutdown">
<description>non-zero error code if shutdown due to failure</description>
<correlationPredicate
expression="ATM_shutdown/extendedData/ATM_number eq ATM_Key"/>
</inboundEvent>
<inboundEvent displayName="customer login" id="customer_login"
multipleCorrelationMatches="raiseException" noCorrelationMatches="ignore"
oneCorrelationMatch="deliverEvent" type="Customer_Action">
<description>customer_action field is "login"</description>
<correlationPredicate
expression="customer_login/extendedData/ATM_number eq ATM_Key"/>
<filter expression='customer_login/extendedData/customer_action eq "login"'/>
</inboundEvent>
<inboundEvent displayName="customer logout" id="customer_logout"
multipleCorrelationMatches="raiseException" noCorrelationMatches="ignore"
oneCorrelationMatch="deliverEvent" type="Customer_Action">
<description>customer_action field is "logout"</description>
<correlationPredicate
expression="customer_logout/extendedData/ATM_number eq ATM_Key"/>
<filter expression='customer_logout/extendedData/customer_action eq "logout"'/>
</inboundEvent>
<inboundEvent displayName="invalid PIN" id="invalid_PIN"
multipleCorrelationMatches="raiseException" noCorrelationMatches="ignore"
oneCorrelationMatch="deliverEvent" type="Customer_Action">
<description>customer_action field is "invalid PIN"</description>
<correlationPredicate
expression="invalid_PIN/extendedData/ATM_number eq ATM_Key"/>
<filter expression='invalid_PIN/extendedData/customer_action eq "invalid PIN"'/>
</inboundEvent>
<inboundEvent displayName="cash withdrawal" id="cash_withdrawal"
multipleCorrelationMatches="raiseException" noCorrelationMatches="ignore"
oneCorrelationMatch="deliverEvent" type="Cash_Withdrawal">
<description>
amount withdrawn is reported as an integer number of cash units
</description>
<correlationPredicate
expression="cash_withdrawal/extendedData/ATM_number eq ATM_Key"/>
</inboundEvent>
<outboundEvent displayName="out of cash alert" id="out_of_cash_alert"
type="Out_Of_Cash_Alert">
<description>emitted when the cash level drops below $1000</description>
<map>
<trigger ref="out_of_cash"/>
<outputValue>
<assignments>
<assignment
leftValue="out_of_cash_alert/extendedData/ATM_number"
rightValue="ATM_Key"/>
<assignment
leftValue="out_of_cash_alert/extendedData/remaining_cash_level"
rightValue="current_cash_level"/>
</assignments>
</outputValue>
</map>
</outboundEvent>
<outboundEvent displayName="invalid PINs alert" id="invalid_PINs_alert"
type="Invalid_PINs_Alert">
<description>emitted when an invalid PIN is entered three times</description>
<map>
<trigger ref="invalid_PINs"/>
<outputValue>
<assignments>
<assignment
leftValue="invalid_PINs_alert/extendedData/ATM_number"
rightValue="ATM_Key"/>
<assignment
leftValue="invalid_PINs_alert/extendedData/customer_account"
rightValue="aux_customer_account"/>
<assignment
leftValue="invalid_PINs_alert/extendedData/number_of_attempts"
rightValue="invalid_PIN_count"/>
</assignments>
</outputValue>
</map>
</outboundEvent>
<outboundEvent displayName="runaway session alert" id="runaway_session_alert"
type="Runaway_Session_Alert">
<description>
emitted when a customer session takes more than 30 minutes
</description>
<map>
<trigger ref="runaway_session"/>
<outputValue>
<assignments>
<assignment
leftValue="runaway_session_alert/extendedData/ATM_number"
rightValue="ATM_Key"/>
<assignment
leftValue="runaway_session_alert/extendedData/customer_account"
rightValue="aux_customer_account"/>
<assignment
leftValue="runaway_session_alert/extendedData/session_active_since"
rightValue="current-dateTime() - time_since_login"/>
</assignments>
</outputValue>
</map>
</outboundEvent>
|
Listing 6b. Completed monitoring context definition for ATM surveillance cont'd
<metric displayName="ATM Key" id="ATM_Key" type="xsd:string" isPartOfKey="true">
<map>
<outputValue>
<singleValue expression="ATM_ready/extendedData/ATM_number"/>
</outputValue>
</map>
</metric>
<metric displayName="current cash level" id="current_cash_level"
type="xsd:integer">
<description>the current amount of cash in the ATM</description>
<map>
<outputValue>
<singleValue expression="ATM_ready/extendedData/initial_cash_level"/>
</outputValue>
</map>
<map>
<outputValue>
<singleValue expression=
"current_cash_level - cash_withdrawal/extendedData/amount_withdrawn"/>
</outputValue>
</map>
</metric>
<metric displayName="time till empty" id="time_till_empty" type="xsd:duration">
<description>a projected time for which the ATM
will still have cash</description>
<map>
<trigger ref="aux_t0"/>
<outputValue>
<singleValue expression="duration('P30D')"/>
</outputValue>
</map>
<map>
<trigger ref="aux_t4"/>
<outputValue>
<singleValue expression="duration(- aux_y_intercept div aux_slope -
decimal(current-dateTime() - aux_last_refilled))"/>
</outputValue>
</map>
</metric>
<metric displayName="next refill date" id="next_refill_date" type="xsd:date">
<description>a suggested date by which the ATM
should be replenished</description>
<map>
<trigger ref="aux_t0"/>
<outputValue>
<singleValue expression="current-date() + duration('P30D')"/>
</outputValue>
</map>
<map>
<trigger ref="aux_t4"/>
<outputValue>
<singleValue expression="date(aux_last_refilled +
duration((10000 - aux_y_intercept) div aux_slope))"/>
</outputValue>
</map>
</metric>
<metric displayName="mean time between failures" id="mean_time_between_failures"
type="xsd:duration">
<description>mean time between ATM shutdowns
due to device failure</description>
<map>
<trigger ref="every_24_hours"/>
<outputValue>
<singleValue expression="if (failure_count gt 0)
then time_operational div failure_count else duration('PT0S')"/>
</outputValue>
</map>
</metric>
<metric displayName="percent down time"
id="percent_down_time" type="xsd:decimal">
<description>
the fraction of time during which the ATM is not operational
</description>
<map>
<trigger ref="every_24_hours"/>
<outputValue>
<singleValue expression="(1 - (time_operational div
(current-dateTime() - first_time_up))) * 100"/>
</outputValue>
</map>
</metric>
<metric displayName="transactions per day" id="transactions_per_day"
type="xsd:decimal">
<description>the average number of transactions per day</description>
<map>
<trigger ref="every_24_hours"/>
<outputValue>
<singleValue expression="session_count div
(decimal(current-dateTime() - first_time_up) div 86400)"/>
</outputValue>
</map>
</metric>
<metric displayName="first time up" id="first_time_up" type="xsd:dateTime">
<description>time of arrival of the first
"ATM Ready" event</description>
<map>
<outputValue>
<singleValue expression="if (empty(first_time_up) and
exists(ATM_ready/predefinedData/creationTime))
then current-dateTime() else first_time_up"/>
</outputValue>
</map>
</metric>
<metric displayName="aux sum x" id="aux_sum_x" type="xsd:decimal">
<description>sum(x) for cash-level linear regression</description>
<map>
<trigger ref="aux_t0"/>
<outputValue>
<singleValue expression="0"/>
</outputValue>
</map>
<map>
<outputValue>
<singleValue expression="aux_sum_x +
decimal(cash_withdrawal/predefinedData/creationTime - aux_last_refilled)"/>
</outputValue>
</map>
</metric>
<metric displayName="aux sum y" id="aux_sum_y" type="xsd:decimal">
<description>sum(y) for cash-level linear regression</description>
<map>
<outputValue>
<singleValue expression="ATM_ready/extendedData/initial_cash_level"/>
</outputValue>
</map>
<map>
<trigger ref="aux_t1"/>
<outputValue>
<singleValue expression="aux_sum_y + current_cash_level"/>
</outputValue>
</map>
</metric>
<metric displayName="aux sum xx" id="aux_sum_xx" type="xsd:decimal">
<description>sum(x^2) for cash-level linear regression</description>
<map>
<trigger ref="aux_t0"/>
<outputValue>
<singleValue expression="0"/>
</outputValue>
</map>
<map>
<outputValue>
<singleValue expression="aux_sum_xx +
decimal(cash_withdrawal/predefinedData/creationTime - aux_last_refilled) *
decimal(cash_withdrawal/predefinedData/creationTime - aux_last_refilled)"/>
</outputValue>
</map>
</metric>
<metric displayName="aux sum xy" id="aux_sum_xy" type="xsd:decimal">
<description>sum(xy) for cash-level linear regression</description>
<map>
<trigger ref="aux_t0"/>
<outputValue>
<singleValue expression="0"/>
</outputValue>
</map>
<map>
<trigger ref="aux_t1"/>
<outputValue>
<singleValue expression="aux_sum_xy +
decimal(cash_withdrawal/predefinedData/creationTime - aux_last_refilled) *
current_cash_level"/>
</outputValue>
</map>
</metric>
<metric displayName="aux last refilled"
id="aux_last_refilled" type="xsd:dateTime">
<description>timestamp of last "ATM ready" event
(time of last refill)</description>
<map>
<outputValue>
<singleValue expression="ATM_ready/predefinedData/creationTime"/>
</outputValue>
</map>
</metric>
<metric displayName="aux slope" id="aux_slope" type="xsd:decimal">
<description>slope of cash depletion line
(in cash units per second)</description>
<map>
<trigger ref="aux_t2"/>
<outputValue>
<singleValue expression=
"((withdrawal_count + 1) * aux_sum_xy - aux_sum_x * aux_sum_y)
div ((withdrawal_count + 1) * aux_sum_xx - aux_sum_x * aux_sum_x)"/>
</outputValue>
</map>
</metric>
<metric displayName="aux y intercept" id="aux_y_intercept" type="xsd:decimal">
<description>y-intercept of cash depletion line
(in cash units)</description>
<map>
<trigger ref="aux_t3"/>
<outputValue>
<singleValue expression=
"(aux_sum_y - aux_slope * aux_sum_x) div (withdrawal_count + 1)"/>
</outputValue>
</map>
</metric>
<metric displayName="aux customer account" id="aux_customer_account"
type="xsd:string">
<description>customer account of current session</description>
<map>
<outputValue>
<singleValue expression="customer_login/extendedData/customer_account"/>
</outputValue>
</map>
</metric>
<counter displayName="failure count" id="failure_count" type="xsd:integer">
<description>number of shutdowns due to failure</description>
<incrementedWhen ref="shutdown_due_to_failure"/>
</counter>
<counter displayName="session count" id="session_count"
type="xsd:integer">
<description>
number of user sessions ("customer login" events)
</description>
<incrementedWhen ref="customer_login"/>
</counter>
<counter displayName="withdrawal count" id="withdrawal_count"
type="xsd:integer">
<description>
number of cash withdrawals since last "ATM Ready" event
</description>
<incrementedWhen ref="cash_withdrawal"/>
<setToZeroWhen ref="ATM_ready"/>
</counter>
<counter displayName="invalid PIN count" id="invalid_PIN_count"
type="xsd:integer">
<description>number of times an invalid PIN was entered
since login</description>
<incrementedWhen ref="invalid_PIN"/>
<setToZeroWhen ref="customer_login"/>
</counter>
<stopwatch displayName="time operational" id="time_operational"
type="xsd:duration">
<description>the total time for which this ATM has been operational</description>
<startedWhen ref="ATM_ready"/>
<stoppedWhen ref="ATM_shutdown"/>
</stopwatch>
<stopwatch displayName="time since login"
id="time_since_login" type="xsd:duration">
<description>elapsed time since login</description>
<startedWhen ref="customer_login"/>
<stoppedWhen ref="customer_logout"/>
<resetWhen ref="customer_logout"/>
</stopwatch>
</monitoringContext>
</monitorDetailsModel>
</mm:monitor>
|
Summary
In this article you constructed a monitoring context for a business activity monitoring (BAM) problem, starting from business requirements and working top down (or outside-to-in). You learned about all model elements that an MC can contain, how to use them in a variety of different scenarios, and how they behave.
Stay tuned for Part 2, which will show how to deploy this model and test it using simulated events.
Download | Description | Name | Size | Download method |
|---|
| project interchange | ar-bam1code.zip | 7KB | HTTP |
|---|
Resources
About the author |