My IBM Log in

Enhancing observability with chaos engineering: Steadybit integration with Instana

28 February 2024

3 min read

In today’s dynamic software landscape, maintaining high performance and reliability is crucial for businesses. Achieving this requires effective observability and two powerful tools to accomplish this are Steadybit and Instana®. The seamless integration of Steadybit with Instana unlocks proactive reliability engineering techniques and a comprehensive solution for optimizing and managing your applications.

Steadybit chaos engineering platform

Steadybit is a resilience testing platform that is designed to proactively identify weaknesses and potential failures in distributed systems. It empowers organizations to build more robust and resilient applications by simulating various failure scenarios in a controlled environment.

Instana automated observability platform

Instana is a leading observability solution. It provides real-time insights into application health, performance and dependencies, which help teams quickly detect and resolve issues to ensure optimal user experiences.

The power of integration

When these two tools join forces, the result is a comprehensive reliability solution that covers both proactive resilience testing and real-time performance monitoring. With the Instana extension (link resides outside ibm.com), Steadybit users can gain insights from Instana on their chaos engineering experiments. When executing those experiments, users can check to validate whether Instana observed any events or incidents. When running experiments, you can create an Instana maintenance window directly from the experiment in Steadybit to avoid escalations while running an experiment designed to inject faults.

Key benefits:

By using this new integration, you can validate whether your system works reliably. If the system does not work reliably, Instana will let you know. Using these two powerful tools together provides some key benefits:

  • Reduced time to value: Increase efficiency when implementing your observability strategy. With Steadybit’s ability to precisely model and inject faulty infrastructure conditions into any environment, Instana custom events and alerts can be fine-tuned before the system under observation is deployed to production, effectively shifting observability from a day-2 activity to a day-1 activity.
  • Enhanced reliability: Steadybit’s resilience testing allows you to identify weaknesses in your system before they impact users. Integrating this with Instana’s monitoring capabilities ensures a holistic approach to system reliability.
  • Faster issue resolution: Instana’s real-time insights combined with Steadybit’s failure injection capabilities enable teams to quickly identify, isolate and resolve issues, minimizing downtime and improving user satisfaction.
  • Continuous optimization: The integration supports a continuous feedback loop for optimizing system performance. Through Steadybit’s insights, teams can fine-tune applications based on real-world scenarios identified during resilience testing.
  • Cost-efficiency: Proactively addressing potential issues through resilience testing can result in cost savings by preventing large-scale outages and minimizing the need for reactive firefighting.

How to integrate Steadybit with Instana

1. Set up Steadybit’s Instana extension:

Begin by configuring (link resides outside ibm.com) Steadybit to communicate with your Instana instance. For that, you simply install Steadybit’s Instana extension next to your Steadybit agent and provide Instana’s base address and authentication details ( see https://github.com/steadybit/extension-instana ) (link resides outside ibm.com).

2. Identify critical scenario:

Use Steadybit to simulate various failure scenarios such as network outages, service failures and latency spikes. Monitor the impact of these scenarios in real-time using Instana. To start as easy as possible, Steadybit provides a ready-to-be-executed Experiment in their Reliability Hub (link resides outside ibm.com). This experiment is also explained in this video here (link resides outside ibm.com).

3. Automated Testing and Monitoring:

Integrate Steadybit into your CI/CD pipeline to automate resilience testing. This ensures that every code change is subjected to a battery of resilience tests before reaching production. To achieve this, you can use Steadybit’s API to run an experiment (link resides outside ibm.com), a GitHub action (link resides outside ibm.com) or CLI (link resides outside ibm.com), depending on what fits your context the best. Check out Steadybit’s blog post “Boost your GitOps practices by integrating Chaos Engineering with Steadybit” (link resides outside ibm.com) to learn more.

4. Incident Response Planning:

Utilize the insights gained from both Steadybit and Instana to refine incident response plans. Having a well-defined strategy based on real-world scenarios improves the team’s ability to respond swiftly and effectively.

The integration of Steadybit with Instana presents a powerful synergy for organizations seeking to elevate their observability and resilience practices. By combining proactive resilience testing with real-time performance monitoring, teams can create more robust, reliable, and optimized applications. This integration ultimately contributes to enhanced user experiences, reduced downtime, and increased overall operational efficiency.

Learn more about Steadybit integration with Instana (link resides outside ibm.com).

 

Author

Trent Shupe

Product Marketing Manager