Observability provides monitoring, logging, and tracing information. It helps administrators understand how the system behaves, troubleshoot issues, and maintain application health.
What is observability
Observability is the feature to measure and understand the internal state of your system by examining its outputs. It enables you to monitor application performance, diagnose problems, and ensure system reliability through three key pillars: monitoring, logging, and tracing. For more details see, ACM observability label-based deployment.
The observability architecture consists of three main sections:
Aggregators including Loki Stack, Prometheus, and Tempo
Middle-tier or collector components including Opentelemetry collector, AlertMgr CRD, and ClusterLogForwarder
Visualization layer including UI plugins for tracing, distributed tracing, logging, console logs, and monitoring with alerts view, dashboard, and Perses dashboard.
Data flows from aggregators through the middle tier to visualization, with alert notifications and dashboards as key outputs.
Figure 1. Observability architecture
Observability components
The observability framework consists of three main components:
Monitoring
Provides real-time visibility into system health through dashboards and alerts. Monitoring includes serviceability dashboards that display panels with insights for troubleshooting issues and responding to alerts. Operational alerts notify you when the system experiences problems.
Logging
Captures and stores system events and application logs for analysis and troubleshooting. Logging supports configurable data retention, external object storage integration, and routing to external Security Information and Event Management (SIEM) tools. You can also convert logs to metrics for enhanced monitoring capabilities.
Tracing
Tracks requests as they flow through distributed systems to identify performance bottlenecks and dependencies. Tracing supports external storage configuration. Generate and export tracing data by using OpenTelemetry auto or manual standards. For AI-powered application observability, you can integrate your application with one of the supported frameworks: Langfuse, OpenLit (Python only), or Traceloop. For additional information about integrating these frameworks, see Enabling Langfuse monitoring, Enabling OpenLit monitoring, and Enabling Traceloop monitoring.