Action recommendation
Site Reliability Engineers (SREs) are the glue that holds the application together and are essential to keeping the product up and running at all times. It is critical for them to keep the Mean Time to Repair (MTTR) as low as possible.
When incidents or issues occur, the action framework employs the following algorithms or integrations to match actions to incidents.
NLP (Natural Language Processing)
When an incident or issue occurs, an event signature is created and matched against action signatures in the action catalog. A signature includes the name, description, entity type, and tags. A score is calculated and normalized.
To increase the score of an action to an event, enhance the action description and add tags to the action that better align with the details of the event.
Event similarity
When an incident or issue occurs, an event signature is created and matched against event signatures for the events configured in policies. A score is calculated and normalized. The action configured in the policy is given this event-to-event score.
Success rate
When an incident or issue occurs, actions are given a score based on the success rate for the action when run against past occurrences of this event.
Turbonomic
If the entity referenced by an incident or issue has IBM Turbonomic actions that are associated with it, these actions are included in the list of recommended actions and assigned High confidence. IBM Turbonomic actions are included only if you configured the integration with IBM Turbonomic. For more information, see Integrating with IBM Turbonomic.
Using recommended actions
All actions in the action catalog are scored, then grouped into a confidence category of Low, Medium, or High. Actions with Medium or High confidence are included as recommended actions. The actions are sorted based on their score.
When an incident has a Probable root cause section with supporting evidence, the Recommended actions section has a Context for drop-down menu to provide context for the recommended actions. By default, Triggering event is selected in the drop-down menu. However, you can change the context to receive recommendations for the probable root cause. The Generate with watsonx AI feature uses the selected context to suggest recommended actions.
You can run recommended actions to diagnose or remediate the event by clicking Run, View, or Launch.
To create a policy from a recommended action, click the + icon and enter the details for policy.
Intelligent remediation with watsonx
You can optionally use generative AI with watsonx to create new actions for the incident or issue. For more information, see Intelligent Remediation: live action generation with watsonx.