IBM Support

APM 8.1.4 old Alarms not clearing from Console

Technical Blog Post


Abstract

APM 8.1.4 old Alarms not clearing from Console

Body

Problem:

In agent logs some thresholds (e.g. R3_Buffer_Hitratio_Crit, R3_Private_Mode_Crit, R3_Private_Memory_Critical, R3_Buffer_Swap_Crit) were raised some weeks ago and do not dissapear. On console it looks like false alarms.

 

Description:

It looks there have been an issue with system around the mentioned times above thus the alarms on the APM UI show up with old date (i.e. false alarm or event). This need to be cleared from the MongoDB with manual run db.alarms.update steps.

 

This can happen if events were originally opened when the agent managed system name was on old host name (or old IP address). However, at some point in time, the subnode managed systems XXX was moved to agent with new name on host with new name (or new IP address).   That caused the problem because the APM server still associated the events with the original agent node and did not clean them up after the subnodes moved to the new agent.

 

Solution:

 

You can clean old alarms from the MongoDB on problematic instance XXX by manual run db.alarms.update with following steps:

--------------------

1) Start the MongoDB command line:

<apm-server-home>/mongodb/bin/mongo --port 27000 alarm -u user -p mongoUsrpasswd@08

 

Note: if you customized the MongoDB password, specify your custom password in place of mongoUsrpasswd@08

 

2) At the > prompt, enter this command:

db.alarms.update( { "enriched_content.situation_thrunode" : "XXX" } , { $set: { "enriched_content.application_specific_extensions.type" : "pure"} }, { upsert: true, multi: true } )

 

3) Wait 15 minutes and then check the APM UI to confirm that the events have been cleared.

--------------------

 

Additional inquiry:

If there are other old events e.g. from March that you need cleared,  you can run these steps at the MongoDB > prompt:

a)  Run this command

> DBQuery.shellBatchSize = 50

b) Run this command

> db.alarms.find( { "threshold_name" : "your-threshold-name", "origin" : "agent-node-name", "status.update_time" : {$regex: /^2019-03(.*)/},  "status.state" : "opened",  "enriched_content.application_specific_extensions.type" : "sampled" } ) 

 

where

- your-threshold-name is the name of the threshold

- agent-node-name is the name of the agent node that you see on the APM UI Events tab

- the regex value for status.update_time assumes that the event was opened in March.  You can change the month or add the day onto the regex expression if needed

 

c) Ensure that you want to clear each of the events returned by the db.alarms.find() command.   If the db.alarms.find() command shows more events than you want to clear then open a case at IBM Support. Please provide a screen capture showing what events you want to clear and collectLogs.sh output and we will help you construct the right query.

 

d) If you want to delete all of the events displayed by the db.alarms.find() command then issue this command:

db.alarms.update( { "threshold_name" : "your-threshold-name", "origin" : "agent-node-name", "status.update_time" : {$regex: /^2019-03(.*)/}, "status.state" : "opened", "enriched_content.application_specific_extensions.type" : "sampled" } , { $set: { "enriched_content.application_specific_extensions.type" : "pure"} }, { upsert: true, multi: true } )

 

where

- your-threshold-name is the name of the threshold

- agent-node-name is the name of the agent node that you see on the APM UI Events tab

- the regex value for status.update_time assumes that the event was opened in March.  You can change the month or add the day onto the regex expression if needed

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSVJUL","label":"IBM Application Performance Management"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

UID

ibm11277434