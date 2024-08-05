Imagine your enterprise’s critical online services are suddenly down, and the IT operations team is working to identify the cause. Minutes turn into hours, and every second of downtime costs the company revenue and customer trust. In a rush to recover the systems, it is critical that your technical experts can isolate and resolve the real problem—or better yet, the ability to get ahead of growing issues and avoid the outage altogether.
This is where an effective cross-platform end-to-end observability strategy becomes essential, allowing organizations to gain rapid insights into the health of their applications and systems.
With services running across the hybrid-cloud including on-prem and multiple hyperscaler platforms, locations and regions, detecting latency and resource issues before they become critical is paramount. As the number of services underpinning application flow increases, the manageability of this environment becomes more challenging.
For born-on-the-cloud applications, an observability approach is essential to provide a unified view of these dynamic dispersed environments. The role of Site Reliability Engineers (SREs) is also critical in ensuring the availability of the full end-to-end application or service. Rather than relying on a less comprehensive view of each technology, the SRE’s application-centric focus identifies which services are performing suboptimally. This guides development teams as they make detailed investigations and fixes.
Observability depends on timely and effective telemetry signals from the underlying systems. The OpenTelemetry project is a direct community-led response to this need and aims to address the head-on challenge of navigating increased complexity.
OpenTelemetry is a vendor-agnostic, open-source framework hosted by the Cloud Native Computing Foundation (CNCF). It aims to enable effective observability across distributed applications and systems by providing an open standard and open tools that support high-quality telemetry data from any source to any target. By building on OpenTelemetry, the telemetry capabilities across different tools and domains can be simplified, making it easier to implement end-to-end observability solutions.
OpenTelemetry’s inherent concept of signal correlation enables the linking and association of different types of signals (such as traces, metrics and logs) to gain a comprehensive insight into an application’s behavior and resources. The OpenTelemetry Semantic Conventions support the correlation of signals by defining a common set of attributes, ensuring that standardized metadata facilitates their association. This is crucial for faster detection and resolution of incidents.
With a growing number of enterprises unlocking the value of their mainframe investments as an integral part of these hybrid cloud environments, end-to-end observability must also span the applications and data that reside on IBM Z®.
This brings both teams to the table: SREs, for whom the transition of an application flow into the mainframe domain can obscure the full observability view, and mainframe teams, with their deep knowledge and tools.
As a widely consumed open standard, OpenTelemetry provides a richer set of tools to expedite the identification of the root cause of issues. Mainframe subject matter experts, with deep mainframe-centric diagnostic tools, can apply these skills in a more targeted and effective fashion. With observability teams and SREs able to identify what is and, critically, what is not a mainframe issue, teams can focus their time more efficiently. This reduces the risk of outages, as well as resolution time.
With IBM and its partners already starting to support OpenTelemetry in our observability and monitoring tools, wider adoption is increasing. We are working with the OpenTelemetry community, with our vendor partners and within our products across IBM Z and IBM LinuxONE to help enable a consistent end-to-end observability experience. Our approach complements our existing operational management tools and instrumentation and focuses on providing high-quality and timely telemetry at appropriate system overhead.
The value of observability extends beyond operational efficiency. It’s about strategic foresight and competitive advantage. Business leaders are keenly interested in how observability through frameworks like OpenTelemetry can provide clarity amidst complexity and unlock the agility of their IT systems. The rewards can be significant, as they are designed to reduce downtime, increase business agility and improve IT resource utilization.