Action recommendation

Site Reliability Engineers (SREs) are the glue that holds the application together and are essential to keeping the product up and running at all times. It is critical for them to keep the Mean Time to Repair (MTTR) as low as possible.

When incidents or issues occur, the action framework employs the following algorithms or integrations to match actions to incidents.

NLP (Natural Language Processing)

When an incident or issue occurs, an event signature is created and matched against action signatures in the action catalog. A signature includes the name, description, entity type, and tags. A score is calculated and normalized.

To increase the score of an action to an event, enhance the action description and add tags to the action that better align with the details of the event.

Event similarity

When an incident or issue occurs, an event signature is created and matched against event signatures for the events configured in policies. A score is calculated and normalized. The action configured in the policy is given this event-to-event score.

Success rate

When an incident or issue occurs, actions are given a score based on the success rate for the action when run against past occurrences of this event.

Turbonomic

If the entity referenced by an incident or issue has IBM Turbonomic actions that are associated with it, these actions are included in the list of recommended actions and assigned High confidence. IBM Turbonomic actions are included only if you configured the integration with IBM Turbonomic. For more information, see Integrating with IBM Turbonomic.

Intelligent remediation with watsonx

You can optionally use generative AI with watsonx to create new actions for the incident or issue. For more information, see Intelligent Remediation: live action generation with watsonx.