Configure aggregation settings
Learn how to configure the type of time windows, their duration, and the timing mode for your cloud or hybrid deployment.
Four types of grouping are available in IBM® Netcool® Operations Insight® on Red Hat® OpenShift®: temporal grouping, temporal patterns, topological, and scope-based event grouping.
Window types
fixedFromFirst
- The window duration that is defined in this configuration is the length of the window where events are grouped together. This time window is determined as a fixed length from the first event. As new events come in, within the window of time, they are added to the group. Any events that occur after the end of this window are not included in the group.rolling
- The window duration for rolling is called the quiet period. It represents the amount of time after the last event when the group membership is closed for new events. With a rolling window, the end time of the window is based on the last event in the group. So, as events keep coming in, the end time of the window shifts.
All grouping types can be configured for fixed time windows instead of rolling time windows, except for super-grouping, which supports only rolling time windows. For fixed time windows, events that occur within the fixed time window are grouped. When events occur after the fixed time window, a new group is created. Any single event that occurs outside the fixed time window does not get added to the group.
For rolling windows, the window duration is referred to as the "quiet period". The group membership closes when the span of time, which is defined by the window, passes without any additional events coming in. The end time of the rolling window is based on the last event to be added to the group. As events come in, they keep getting added to the group and the end time of the window keeps rolling.
Supergroups and rolling windows
A supergroup is created when events match multiple cloud native event analytics policies. All events in the supergroup can match all policies, or some can match some policies and some others, but there must be some overlap.
- Events 1 and 2 match policies
a
,b
, andc
in a valid time window. - Events 3 and 4 match policy
a
in an overlapping valid time window. - Events 5 and 6 match policy
c
in an overlapping valid time window.
The rolling time window for supergroups is used to determine the amount of data that the collator keeps in memory for groups. For supergroups, the time windows for the constituent groups determine when the supergroup ends. If all constituent groups within a supergroup have ended, no further groups are added to the supergroup, even if they start within the rolling time window of the supergroup.
If constituent groups are still open, the rolling time window for supergroups has no effect. If scope or temporal groups keep being created with overlapping time windows, the supergroup stays active and will stay open beyond its time window setting. However, if the scope or temporal groups expire, new groups are not added to the supergroup, even if they come in within the supergroup time window setting.
A temporal group starts with the second event that matches the temporal policy. Thereafter, the
temporal group will live for its time window setting (either rolling
or
fixedFromFirst
). Scope-based groups start with the second event that matches the
scope. Thereafter, the scope group will live for its time window setting (either
rolling
or fixedFromFirst
). Or, if the scope-based event grouping
(SBEG) policy that is used to populate the ScopeID
parameter has a time period
setting, the scope group will live from the QuietPeriod
on the event. In this case,
the ScopeID
field is prefixed with FX:
.
Exception: Where a relevant field is updated on an event in an old group, which makes it match a new group, the events for the new group are added to the old supergroup.
Event timing
In addition to window type and duration, you can also set the timingMode
parameter to explicit
.
The time of an event is determined from when the event is seen by the cloud native analytics pods. Specifically, the
time of an event is set as the time that the inference service looks at the event and determines
whether there is a valid policy for the event. This time is called the
policyTimeStamp
. The deduplication pod bases the grouping on the
policyTimeStamp
value. However, small delays in the system might cause the
inference service and deduplication pods to see events out of sequence. This situation might happen
if the gateway sends the events out of order, or if the cloud native analytics pods are scaled and
multiple pods are working on the events.
To overcome timing delays and out-of-sequence events, timingMode
can be set to
explicit
. When timingMode
is explicit
, the
deduplication pod holds the events for a small time period before they are processed. The value of
the holdOffSeconds
parameter is the amount of time that the deduplication pod waits
for late, out-of-sequence events. By default, the holdOffSeconds
value is 60
seconds, but you can change the value with the API. A value larger than 600 seconds is not
recommended. Set the holdOffSeconds
value as low as possible to account for a
reasonable delay between two events, which should be grouped together, coming from the ObjectServer
to the pods. Investigate delays longer than 10 minutes to resolve the root
cause.
When you set timingMode
to explicit
, group formation is delayed
by the holdOffSeconds
value. There is a tradeoff between reducing out-of-sequence
events and delaying the formation of groups.
By holding the events for a small time period before they are processed, the deduplication pod
can correctly group the events. This grouping is done based on the
policyTimeStamp
value and not the FirstOccurrence
value. However,
the FirstOccurrence
value is relevant because the deduplication pod uses the
FirstOccurrence
value to determine whether it can hold an event.
The deduplication pod holds an event only if it is not already late. For example, the pod delays
the processing of events that it receives, until the holdOffSeconds
value expires.
The group is started only at the end of the holdOffSeconds
period. If an event is
already older than the holdOffSeconds
value (based on
FirstOccurrence
), then it cannot be held, because it is already old. These events
are processed immediately. For example, if an event comes in with a policyTimeStamp
of 16:30
, and FirstOccurrence
of 16:10
, then the
deduplication pod processes it immediately, because 20 minutes is more than the
holdOffSeconds
value. The deduplication pod can only hold events for up to the
holdOffSeconds
number of seconds after the FirstOccurrence
time.
For fresh deployments of version 1.6.10 and later, the default value for
timingMode
is explicit
and the default value for
holdOffSeconds
is 60, which are the recommended settings.
Consider the following example, which assumes a five-minute fixed window, with explicit timing,
and a holdOffSeconds
value of 60 seconds. If event1
with
FirstOccurrence
of time 18:05
gets to the deduplication pod at
time 18:05
, the deduplication pod holds the event until time
18:06
. If event2
, with FirstOccurrence
of time
18:02
gets to the deduplication pod at time 18:10
, then it is not
delayed. The events are grouped together in the group that starts at time 18:06
because the 18:06
and 18:10
times are within a five-minute window
of each other.
In
addition to the timingMode
parameter, another parameter to consider is the
useFirstOccurrenceLatenessThreshold
parameter. When the
useFirstOccurrenceLatenessThreshold
value is set to true
, the
deduplication pod ignores events for which the FirstOccurrence
timestamp is earlier
than the current time minus the QuietPeriod
value. For example, when
useFirstOccurrenceLatenessThreshold
is set to true
and the
QuietPeriod
is five minutes, events with a FirstOccurrence
timestamp older than five minutes ago are ignored. The QuietPeriod
value is the
window length. For scope-based windows, the QuietPeriod
value can be set at the
event level, or defaulted from the global or policy settings.
For fresh deployments of version 1.6.10 and later, the default value for
useFirstOccurrenceLatenessThreshold
is true
, which is the
recommended setting. For installations of version 1.6.9 and earlier, the
useFirstOccurrenceLatenessThreshold
value is set to false
. For
upgrades from version 1.6.9 and earlier to version 1.6.10, the
useFirstOccurrenceLatenessThreshold
value is set to false
.
Sometimes, a probe can be disconnected from the system for a time, and its alarms arrive late.
Probes go into store-and-forward mode if they can't connect to an ObjectServer. After the connection
is restored, the probes replay events in chronological order. In this way, old events can be fed
into the deduplication pod. To avoid grouping these old events, set
useFirstOccurrenceLatenessThreshold
to true
.
groupAggregationConfiguration
are as
follows: "timingMode": "explicit",
"holdOffSeconds": 60,
"useFirstOccurrenceLatenessThreshold" : true
Setting global aggregation defaults
The default window type is a rolling one, with a default duration of 1200 seconds. You can change the window type and the duration at a global level. However, a rolling window is the only supported window type for super-group aggregation. The global configuration can be changed in Swagger, by using the CNEA Aggregation Configuration API.
ibm-hdm-analytics-dev-normalizer-aggregationservice
service to enable the swagger
API as follows.- Run the following
command:
oc edit deploy $(oc get deploy|grep normalizer-agg|awk '{print $1}')
- Under the containers env section add the following
lines:
- name: ENABLE_SWAGGER_UI value: "1"
- Save the deployment.
- Next, create an external route to the normalizer-aggregation service. This step can be done
either in the Red Hat
OpenShift Container Platform
Console under
or from the command line. In either case, use the sample YAML file,
and modify according to your setup. The YAML is then either pasted into the Create
Route YAML page, or you create a file and apply the YAML by using the
oc create -f
command.Sample YAML:Where:apiVersion: route.openshift.io/v1 kind: Route metadata: name: <NOI-release-name>-agg-norm-api namespace: <NOI-namespace> spec: host: normalizer-aggregationservice-<release name>.apps.<FQDN> path: /api/aggregation/ port: targetPort: 5600 tls: termination: edge to: kind: Service name: <NOI-release-name>-ibm-hdm-analytics-dev-normalizer-aggregationservice weight: 100 wildcardPolicy: None
- Release name is the name that is given to your IBM Netcool Operations Insight on Red Hat OpenShift deployment.
FQDN
is the hostname where the IBM Netcool Operations Insight on Red Hat OpenShift and UI is running. This name can be found by running the following command:oc get route | grep common-ui | awk '{print $2}'
Example output:netcool-<release name>.apps.<FQDN>
- The service name of the existing normalizer aggregation service is retrieved by running the
following command:
oc get service | grep normalizer-aggregationservice
- The Swagger UI is accessible from a browser that uses a URL in the following
format:
https://normalizer-aggregationservice-<release name>.apps.<FQDN>/api/aggregation/docs/aggconfig/v1/
Note: Authorization for the API, when configuring at the global level, is obtained from NOIsystemauth-secret
in the NOI namespace.
Set global defaults: Group and super-group aggregation global defaults can
be set with the following JSON examples. In the groupAggregationConfiguration
section, you can change the windowType
to fixedFromFirst
or leave
as rolling
, and change the duration.
groupAggregationConfiguration
, you can also set the
useFirstOccurrenceLatenessThreshold
.{
"groupAggregationConfiguration": {
"windowType": "fixedFromFirst",
"durationSeconds": 300,
"timingMode": "explicit",
"holdOffSeconds": 60,
"useFirstOccurrenceLatenessThreshold": true
},
"supergroupAggregationConfiguration": {
"windowType": "rolling",
"durationSeconds": 1200
},
"groupFinalisationConfiguration": {
"enabled": false,
"durationSeconds": 0
}
"additionalProp1": {}
}
useFirstOccurrenceLatenessThreshold
parameter takes effect,
regardless of the timing mode.groupFinalisationConfiguration
parameters.supergroupAggregationConfiguration
, you can change the duration, but only a
rolling
window type is supported for super-grouping.{
"groupAggregationConfiguration": {
"windowType": "rolling",
"durationSeconds": 1200,
"timingMode": "explicit",
"holdOffSeconds": 60,
"useFirstOccurrenceLatenessThreshold": true
},
"supergroupAggregationConfiguration": {
"windowType": "rolling",
"durationSeconds": 1200
},
"groupFinalisationConfiguration": {
"enabled": false,
"durationSeconds": 0
}
}
For more information, see CNEA Aggregation Configuration API.
Configuring aggregation at a policy level
Policy level aggregation overrides default or global aggregation settings. To set the group aggregation configuration at a system policy or user policy level, enable the Swagger UI for the policy registry service.
ibm-hdm-analytics-dev-policyregistryservice
service to enable the swagger API as follows:- Run the following
command:
oc edit deploy $(oc get deploy|grep policyregistryservice|awk '{print $1}')
-
Under containers env settings, add the following code:
- name: ENABLE_SWAGGER_UI value: “1”
Note: Three containers are defined in the policy registry service deployment YAML. The environment variable must be added to the environment section of the container named<release name>-ibm-hdm-analytics-dev-policyregistryservice
. - Search for the route for the microservice by running the following
command:
Using the route results you can construct the URL for accessing the Swagger UI in the following format:oc get route|grep policyregistryservice
netcool-<release name>.apps.<FQDN>
For user policies:
https://<route>/api/policies/docs/policies/user/v1/
For system policies:
https://<route>/api/policies/docs/policies/system/v1/docs-api
Note: The Swagger UI at the policy level uses a different authentication method from the global section. An API key is used and this key must be generated. This step is done from the GUI under . Ensure that the Policy User API is selected when generating the API key.
To change the window type or duration, you need to add the
groupAggregationConfiguration
to the metadata section of the policy. The steps are
the same for both system and user policies.
To specify fixedFromFirst
as the window type, add the following JSON to the
policy metadata section and set the duration of the fixed window to the desired length (see
Update the policy
):
"groupAggregationConfiguration": {
"windowType": "fixedFromFirst",
"durationSeconds": 60
}
To specify rolling as the window type, add the following JSON to the policy metadata section and set the duration of the quiet period to the desired length (see Update the policy):
"groupAggregationConfiguration": {
"windowType": "fixedFromFirst",
"durationSeconds": 60
}
Update the policy:
- Perform the
GET
operation of the policy to get the entire JSON for the policy. - Copy the JSON to the
PUT
command section. - Add the sample JSON for the window type to the policy and perform the
PUT
operation.
"metadata": { "createdBy": { "entityType": "analytics", "entityId": "system", "entityMetadata": { "trainingTimestamp": "2022-09-21T06:11:40Z" } },
"lastUpdatedBy": { "entityType": "analytics", "entityId": "system" }, "lastUpdated": "2022-09-21T06:11:47.502Z", "created": "2022-09-21T06:11:47.502Z",
"statedata": { "locked": false, "state": "active", "userId": "icpadmin", "timestamp": 1663742017996 }, "model": { "trainingTimestamp": 1663742017996 },
"name": "Temporal-patterns-policy-sept21" }
Example policy metadata section after
addition of policy
aggregation:"metadata": { "createdBy": { "entityType": "analytics", "entityId": "system", "entityMetadata": { "trainingTimestamp": "2022-09-21T06:11:40Z" } },
"lastUpdatedBy": { "entityType": "analytics", "entityId": "system" }, "lastUpdated": "2022-09-21T06:11:47.502Z", "created": "2022-09-21T06:11:47.502Z",
"statedata": { "locked": false, "state": "active", "userId": "icpadmin", "timestamp": 1663742017996 }, "model": { "trainingTimestamp": 1663742017996 },
"name": "Temporal-patterns-policy-sept21", "groupAggregationConfiguration": { "windowType": "fixedFromFirst", "durationSeconds": 60 } }
Event level aggregation
For scope-based grouping, you can define, at the event level, a configuration for fixed time
windows. Configure the fixed time windows with the scope-based grouping policy that is used to
populate the ScopeID
, by specifying a QuietPeriod
and selecting
the Use fixed time window option. The ScopeID
value is
prefixed with FX:
and the QuietPeriod
field for the event will
contain the window length. This configuration allows the aggregation to identify events that have a
fixed time window. If the ScopeID
has no FX:
prefix, global or
policy level configurations are used. Without the FX:
prefix, the window duration
is taken from the configured window duration, which is set at a global or policy level.