Achieving operational efficiency through Instana’s Intelligent Remediation

With digital transformation all around us, application environments are ever growing leading to greater complexity. Organizations are turning to observability to help them proactively address performance issues efficiently and are leveraging generative AI to gain a competitive edge in delivering exceptional user experiences. This is where Instana’s Intelligent Remediation comes in, as it enhances application performance and resolves issues, before they have a chance to impact customers.

Now generally available: Instana’s Intelligent Remediation

Announced at IBM Think 2024, I’m happy to share that Instana’s Intelligent Remediation is now generally available. These capabilities assist DevOps/SRE teams with over 90 actions, many of them automated, generated by watsonx.ai™, for diagnosing and resolving incidents.

Delivered as prescriptive manual steps, scripts and Ansible® action playbooks, these actions cover a wide array of technology areas including containers, Elasticsearch, Host, JVM, Kafka and Kubernetes.

Instana’s Intelligent Remediation in action

Now, let’s imagine a scenario: Instana has notified you of an incident. There’s a sudden increase in latency of requests to your website. Instana has been observing data from thousands of resources that make up your applications: processes, containers, virtual machines, Kubernetes clusters running websites, Java applications, databases, message queuing systems and more. But something subtle has snapped, like the proverbial tree falling in the uninhabited forest, no-one was listening out for. Except this time, it’s different. Instana was there.

Your website’s response latency has spiked and your Kubernetes hosted application isn’t handling requests as fast as it was. An incident has been created and the Automation section shows Automation Policies and Recommended Actions related to the incident. These are a part of Instana’s Automation Framework. A policy links the incident event type to an action and describes when and how to run the action. Instana has matched the incident with prior similar incidents that have occurred in this environment. Incidents that users have resolved by running an action and when that was successful, creating a policy. This is how the knowledge from work carried out by users, resolving issues, is retained and reused.

Until now, it is only prior work by DevOps/ITOps teams and their experiential knowledge, that can help other application stakeholders. While this is a great store of value, what happens when there are no prior cases of an incident? Enter watsonx.ai with the power of generative AI to help seed new solutions in the form of actions tailored to the context of the incident event.

Solving the problem with Instana

Let’s go back to our Kubernetes hosted application. There was a significant increase in latency of web site responses. An incident was created and within the incident page, Recommended Actions section, the user can now view an action, generated by watsonx.ai. This takes them through the steps of diagnosing the application, scaling up the Kubernetes deployment and suggesting alternative steps if the issue persists. By adding the action to a policy, the user can share these steps to help future users with the same issue. With the Automation Framework and Action Catalog, users can capture their own actions and policies with other users to help them resolve similar incidents in the future. Intelligent Remediation assists this process further by adding actions generated by watsonx.ai correlated to the incident being viewed. When I think of what the future holds as we further integrate Instana’s original machine learning (ML) capabilities with watsonx.ai, I get excited about all the future incidents that won’t occur because of Instana automated remediation.

Author

Jeremy Hughes

Product Manager

IBM Instana

Learn more about Instana Intelligent Remediation capabilities