Adding triggers for event deduplication

Processing Juniper Contrail UVEAlarms requires probe rules, custom tables and triggers specific for contrail data.

Configuration files

The triggers required for event deduplication are set up in the following files:

Table 1. Files for setting up deduplication

File

Purpose

message_bus_contrail.rules

The probe rules for Contrail Stream-sent-event (SSE) UVEAlarms.

message_bus_contrail.lookup

The lookup file for Contrail Stream-sent-event (SSE) UVEAlarms.

create_agg_dedup_trigger.sql

This file performs the following tasks:

  • Creates default_triggers.agg_deduplication_contrail.
  • Disables default_triggers.deduplication.

create_contrail_tables.sql

This file performs the following tasks:

  • Creates new columns in the alerts.status table.
  • Creates the nc_triggers.monitor table.
  • Creates the custom.Contrail_problem_events table.

create_contrail_triggers.sql

This file performs the following tasks:

  • Creates Contrail_triggers.custom_contrail_generic_clear.
  • Creates Contrail_trigger.dedup_contrail_problem_events.

The SQL files create the following new objects in the ObjectServer.

Table 2. Objects created by the SQL files

Type

Name

Purpose

Column

The following columns are created in the alerts.status table:

ClearTally, AckTime, ClearTime, ClearedBy, RTTSSev, SuppressEscl, Int2, Int9, Int11, TTVariable, TTFlag, TTOriginatorUser, TTReferredGroup

These customized columns are used for holding values.

Table

nc_triggers.monitor

This table holds the name of the monitor trigger that is activated in the deployment.

Table

custom.Contrail_problem_events

This table holds the entry of a JuniperContrail node. Row insertion to this table is executed in the agg_contrail_deduplication trigger.

Trigger

default_triggers.agg_contrail_deduplication

This trigger updates a JuniperContrail alarm and inserts a new Node row record in the custom.Contrail_problem_events table.

See the Contrail_trigger.dedup_contrail_problem_events trigger for the update operation on the custom.Contrail_problem_events table.

Trigger

Contrail_triggers.custom_contrail_generic_clear

This trigger periodically performs event clearing on the alarms that are incidentally not in the latest JuniperContrail update.

To clear the JuniperContrail alarms in the alerts.status table, the inference logic is looking for alarms from the same Node having AlertGroup of the value not in the aggregated URL of the custom.Contrail_problem_events table for the Node.

Trigger

Contrail_trigger.dedup_contrail_problem_events

This trigger updates the record of the JuniperContrail node’s alarms, prepping the aggregated URL list for next event clearing operation.

When the Message Bus Probe is no longer needed in the environment, run the following files in conjunction with the probe uninstallation:

Table 3. Files to remove triggers

File

Purpose

remove_contrail_agg_dedup.sql

This file performs the following tasks:

  • Removes the default_triggers.agg_deduplication_contrail trigger.

    Enables the default_triggers.deduplication trigger.

remove_contrail_tables.sql

This file performs the following task:

  • Removes the objects created by the create_contrail_tables.sql file.

remove_contrail_triggers.sql

This file performs the following task:

  • Removes the object created by the create_contrail_triggers.sql file.

Juniper Contrail UVEAlarms alarm

Alarms of type UVEAlarms are for probe consumption.

The UVEAlarms json has the following structure:

{ 
   "type":"UVEAlarms",
   "value":{ 
      "alarms":[ { alarm_1 }, { alarm_2 }, { alarm_3 } ],
      "__T":1574156963194879
   },
   "key":"ObjectVRouter:dcr01-contrail-dpdk-13"
}

A UVEAlarms json message contains a list of alarms of a node in the array field named alarms. Each alarm reports the state change of a threshold. The probe pushes each alarm to the ObjectServer to become a Netcool event.

Member alarms in alarm list (identifiable by threshold name) may be constant or changing in every UVEAlarms json update. In the case of a reduced alarm list, for a threshold alarm previously reported but currently unavailable in the UVEAlarms json, its absence is an indication of the alarm concerned having been removed or resolved in the Contrail system.

In this case the alarm has the following structure:

{
    "severity":2,
    "alarm_rules": {…},  => Containing more data.
    "timestamp":1577837594000011,
    "ack":false,
    

    "token":"eyJ0aW1lc3RhbXAiOiAxNTczNDcxN2MzI1MDQ5LCAiaHR0cF9wb3J0IjogNTk5NSwgImhvc3RfaXAiOiAiMTkyLjIwIn0=",
    "type":"default-global-system-config:system-defined-phyif-bandwidth",
    "description":"Physical Bandwidth usage anomaly."
}

When a node is cleared of all alarms in the Contrail system, the resultant UVEAlarms json message has the following structure:

{ 
   "type":"UVEAlarms",
   "value": null,
   "key":"ObjectVRouter:dcr01-contrail-dpdk-13"
}

Every UVEalarms json subject to the probe’s parsing will become one or more batches of event tokens for probe rules processing. For more details, see the following UVEAlarms Event tokens section.

UVEAlarms event tokens

The probe json parser coverts every alarm in the alarms array into a Netcool event. The token format is driven by the probe json parser configuration.

Each set of Netcool event tokens consists of the following alarm tokens and data tokens:

Alarm tokens

The alarm tokens come from the json parent fields and the json level defined in the messagePayload.

This subset of tokens is a key constituent for forming a Netcool event.

Table 4. Alarm tokens

Token

Description

$ack

This indicates whether the alarm has been acknowledged.

$description

This displays a description of the alarm.

$severity

This displays the severity of the alarm. It is mapped to Netcool severity.

$timestamp

This shows the time at which the alarm occurred.

$token

This contains the Contrail token.

$type

Alarm threshold name.

$(MSGHEADER.key

This indicates the node from which the alarms originated.

$(MSGHEADER.type

This displays the alarm type. It will always be UVEAlarms.

$(MSGHEADER.value._T

This shows the time at which the message was generated.

$(alarm_rules.*

This contains the Contrail alarm rules that define the threshold boundary.

Example:

alarm_rules.or_list.0.and_list.0.condition.operand1: NodeStatus.process_status.state

alarm_rules.or_list.0.and_list.0.condition.operand2.json_value: "Functional"

alarm_rules.or_list.0.and_list.0.condition.operation: !=

alarm_rules.or_list.0.and_list.0.condition.variables.0: NodeStatus.process_status.module_id

alarm_rules.or_list.0.and_list.0.condition.variables.1: NodeStatus.process_status.instance_id

alarm_rules.or_list.0.and_list.0.match.0.json_operand1_value: "Non-Functional"

alarm_rules.or_list.0.and_list.0.match.0.json_variables.NodeStatus.process_status.instance_id: 0

alarm_rules.or_list.0.and_list.0.match.0.json_variables.NodeStatus.process_status.module_id: "contrail-collector"

All data tokens

These tokens come from all fields in the UVEalarms json inclusive of the data of all member alarms available.

The tokens are notably within a header (defined in headerPrefix) and inline indices (namely, for enumerating values from all elements).

For example:

MSGHEADER.value.alarms.0.description:      Process(es) reporting as non-functional 
MSGHEADER.value.alarms.1.description:      Process Failure.
MSGHEADER.value.alarms.2.description:      System Info Incomplete

The probe rules collect threshold names from these tokens to form a comma-delimited list that will be stored in the URL field of the custom.Contrail_problem_events table, and will in turn be referenced by the custom_contrail_generic_clear trigger to identify and clear the Netcool events associated with the alarms no longer present in the UVEAlarms update.

Table 5. All data tokens

Token

Description

$(MSGHEADER.value.alarms).<index>.ack)

This is the same as $ack.

$(MSGHEADER.value.alarms).<index>.description)

This is the same as $description.

$(MSGHEADER.value.alarms).<index>.severity)

This is the same as $severity.

$(MSGHEADER.value.alarms).<index>.timestamp)

This is the same as $timestamp.

$(MSGHEADER.value.alarms).<index>.token)

This is the same as $token.

ObjectServer Database

The following alerts.status table columns are pivotal to the Event Clear operation:

Table 6. All data tokens

Column

Description

@URL

This token stores an aggregated list of the monitored thresholds.

The comma-delimited values are extracted from $(MSGHEADER.value.alarms.*.type) tokens.

@AlertGroup

This token stores the name of the monitored threshold.

The value is extracted from the second-half of $type token, delimited by a colon.

Examples of type value:

default-global-system-config:system-defined-address-mismatch-compute
default-global-system-config:system-defined-address-mismatch-control
default-global-system-config:system-defined-bgp-connectivity
default-global-system-config:system-defined-conf-incorrect
default-global-system-config:system-defined-core-files
default-global-system-config:system-defined-disk-usage-critical
default-global-system-config:system-defined-disk-usage-high
default-global-system-config:system-defined-node-status-failure
default-global-system-config:system-defined-package-version-mismatch
default-global-system-config:system-defined-partial-sysinfo
default-global-system-config:system-defined-pending-cassandra-compaction-tasks
default-global-system-config:system-defined-phyif-bandwidth
default-global-system-config:system-defined-process-connectivity
default-global-system-config:system-defined-process-status
default-global-system-config:system-defined-vrouter-interface
default-global-system-config:system-defined-xmpp-connectivity

@Node

This token stores the origin of the alarms, the parent key field, namely $(MSGHEADER.key) token.

This value is also used as primary key to the custom.Contrail_problem_events table.

See the default_triggers.agg_contrail_deduplication trigger for the row insertion.

How event clear works

Every contrail alarm processed in the ObjectServer provides updates to the alerts.status table and the custom.Contrail_problem_events table.

The following is a-level description of the possible outcomes for a Contrail alarm in the alerts.status table:

  • A Netcool event will be created for a node’s alarm that is new to the alerts.status table.
  • Existing Netcool events associated with the alarm present in UVEalarms will be updated.
  • Existing Netcool events associated with the alarm absent in UVEalarms will be cleared.

    For example, "value": null in UVEAlarms denotes all alarms having been cleared, consequently the corresponding Netcool events will be cleared.

In the Contrail use case, event clear is a periodic operation run by a trigger, therefore clear in the Event List is not immediately seen but occurs in every trigger’s cycle.

The custom.Contrail_problem_events table holds aggregated data of Contrail alarms grouped by their respective nodes. In conjunction with the clear trigger, the node’s aggregated data facilitates alerts.status event clear, the crucial fields are Node and URL. Node is alarm’s origin; URL is comma-delimited threshold list.

Every contrail alarm will bring one of these in the custom.Contrail_problem_events table:

  • A new entry for previously unavailable node will be created.
  • Existing entry (based on node) will be deduplicated – updated with new threshold list.

Every new or updated row is marked for the periodic clear trigger to pick up in its next active cycle.

The custom_contrail_generic_clear trigger periodically performs Event Clear by going through freshly updated rows in the custom.Contrail_problem_events table. When processing a row, the trigger uses the threshold list from the URL field as the reference to select the alarms of the node concerned in the alerts.status table that contains a threshold name (AlertGroup column) not found in the threshold list. In the trigger, the selected alarms will be set with 0 severity among other things as to effect event clear.