Libraries
In the Resilience dimension, you can reference a library of Concert-provided requirements. The contents of each library are organized by category, and including metrics, weights, and the calculation used to generate a resilience score for each application deployment.
- Foundational library
- Container Build Integrity library
- Runtime Production Readiness library
- VM Integrity Assurance library
- Java Runtime library
- Custom library
- Message queue library
Foundational library
Introduced with Concert 1.0.5, the foundational library is a core capability within Concert's resilience feature, enabling organizations to systematically measure, score, and improve application deployment resilience. It is built around site reliability engineering (SRE) best practices that are related to application deployment performance, incident management, and release management.
Existing approaches to assessing application deployment resilience are often fragmented, tool-specific, or lack clear scoring criteria. This makes it difficult to benchmark or drive continuous improvement across teams and services. The foundational library addresses this gap by providing a unified, extensible framework for resilience measurements that integrate seamlessly with existing observability and ITSM tools, while also accommodating manual reporting where automation is not feasible.
- Availability - Tracks overall and regional application deployment availability using synthetic tests, error rates, and downtime analysis.
- Maintainability - Evaluates deployment practices, automation coverage, run book completeness, and change management hygiene.
- Observability - Measures incident detection, diagnostic, and acknowledgment times, as well as the percentage of incidents detected by monitoring.
- Recoverability - Quantifies mean time to mitigate and recover, as well as recovery time and point objectives (RTO/RPO) and rollback readiness.
- Scalability - Assesses throughput and visit rates during scaling events, ensuring performance under load.
- Usability - Focuses on user-facing metrics, such as latency and error rate SLOs, as well as blast radius management during incidents.
Each requirement is defined with clear measurement units, detailed descriptions, and is mapped to leading and lagging indicators sourced from automated tools, such as APM and ITSM systems, as well as manual inputs.
You can use this library to generate objective baselines and compare resilience across application deployments, business units, and time periods. It can also help provide actionable insights for targeted, continuous improvements in operational practices and technical architecture. The unified reporting simplifies resilience assessments for technical and non-technical stakeholders using data from multiple sources, ensuring the metrics are transparent and aligned with industry best practices and regulatory requirements.
Ingest metrics from APM tools, such as Instana, to gather real-time telemetry, error rates, throughput, and synthetic test results. You can also ingest resilience metrics from ITSM tools like ServiceNow for data related to incidents, change, and recovery, including time-to-detect, time-to-recover, and incident root cause analysis.
In the foundational library, metric are categorized as a leading indicator or a lagging indicator. Leading indicators are predictive metrics that help predict future performance, whereas lagging indicators are measurements that are collected from past outcomes.
Container Build Integrity library
Introduced with Concert 1.1.0, this library contains requirements that reflect a standardized framework assessing container image quality across security, efficiency, and maintainability dimensions. It combines static analysis with build process telemetry to provide actionable improvement scores.
Hidden risks in container build pipelines lead to bloated images, security vulnerabilities, and production inefficiencies. Using this library to assess resilience can help answer questions like, Are our builds truly production-grade?
You can use the requirements in this library (via resilience profiles) in conjunction with resilience data ingested using Concert Workflows to automate vulnerability remediation.
- Integrity - Validates that agreed-upon standards are adopted when building, deploying, and hosting application deployments.
- Scalability - Assesses throughput and visit rates during scaling events, ensuring performance under load.
- Security - Aims to protect the system and its data from unauthorized access, use, disclosure, disruption, modification, or destruction due to malicious attacks, data breaches, or various threats. It assesses the confidentiality, authenticity, and availability of data in your application deployments.
- Maintainability - Evaluates deployment practices, automation coverage, run book completeness, and change management hygiene.
- Availability - Evaluates whether containerized services are resilient and restartable during failures or outages. Metrics in this category measure conditions such as missing restart policies, which may prevent container recovery and continuity during disruptions.
Use this library to help detect secrets and other vulnerabilities at build time versus runtime, supporting a proactive, early security approach to vulnerability management. This can help you also optimize build efficiency by reducing the image size and build times through layer analysis and enforcing compliance by automating best practices from CIS benchmarks.
Runtime Production Readiness library
Introduced with Concert 1.1.0, this library contains requirements that reflect a standardized scoring framework evaluating containerized workloads across reliability, security, and Kubernetes best practices. It combines runtime behavior with deployment configurations to produce actionable readiness scores.
Fragmented container observability leads to reactive incident management, hidden reliability risk, and an inability to provide Kubernetes compliance. Using this library helps answer questions like, Are our containers truly production-ready?
through numerous critical dimensions of operational health and to create automated low-code workflows to remediate poorly orchestrated deployments.
- Availability - Tracks overall and regional application deployment availability using synthetic tests, error rates, and downtime analysis.
- Integrity - Validates that agreed-upon standards are adopted when building, deploying, and hosting application deployments.
- Maintainability - Evaluates deployment practices, automation coverage, run book completeness, and change management hygiene.
- Recoverability - Quantifies mean time to mitigate and recover, as well as recovery time and point objectives (RTO/RPO) and rollback readiness.
- Scalability - Assesses throughput and visit rates during scaling events, ensuring performance under load.
- Security - Aims to protect the system and its data from unauthorized access, use, disclosure, disruption, modification, or destruction due to malicious attacks, data breaches, or various threats. It assesses the confidentiality, authenticity, and availability of data in your application deployments.
Draft comment: erin.pelkey@ibm.com
Needs review - Usability - Focuses on user-facing metrics, such as latency and error rate SLOs, as well as blast radius management during incidents.
Use this library to improve container health scoring by quantifying adherence to best practices and security controls for containerized workloads. This library also helps quantity tolerance for node failures and outages and automates verification of key security controls from market leaders' best practices.
VM Integrity Assurance library
Introduced with Concert 1.1.0, this library contains requirements that reflect a standardized scoring system for evaluating virtual machine (VM) health across security hardening, operational efficiency, and compliance adherence. It combines infrastructure telemetry with security postures and is designed to address the fragmented VM monitoring problem by providing a comprehensive approach to monitoring security, performance, and compliance metrics.
The unified scoring system provided by the library answers the question, How healthy are our VMs?
by offering quantifiable benchmarks that allow organizations to proactively identify and address issues before they become major problems. This reduces the need for reactive incident management and minimizes hidden risk.
- Integrity - Validates that agreed-upon standards are adopted when building, deploying, and hosting application deployments.
- Maintainability - Evaluates deployment practices, automation coverage, run book completeness, and change management hygiene.
- Recoverability - Quantifies mean time to mitigate and recover, as well as recovery time and point objectives (RTO/RPO) and rollback readiness.
- Scalability - Assesses throughput and visit rates during scaling events, ensuring performance under load.
- Security - Aims to protect the system and its data from unauthorized access, use, disclosure, disruption, modification, or destruction due to malicious attacks, data breaches, or various threats. It assesses the confidentiality, authenticity, and availability of data in your application deployments.
Draft comment: erin.pelkey@ibm.com
Needs review - Usability - Focuses on user-facing metrics, such as latency and error rate SLOs, as well as blast radius management during incidents.
Use this library to combine technical metrics into one actionable score that prioritize your most critical VM risks first. You can demonstrate proof of compliance through audit trails that show continuous adherence to CIS/PCI benchmarks and track the state of VM degradation or improvement over time.
Java Runtime library
Introduced with Concert 2.0.0, this library contains requirements that reflect a standardized scoring system for evaluating Java workloads running on virtual machines (VM). It enables organizations gain visibility into application deployment health and risk posture without requiring agent installation or elevated access.
The scoring system supports automated discovery of Java application deployments such as Tomcat or Java servlets across Linux-based virtual machines accessible via SSH credentials. This includes workloads running on AWS EC2, IBM Cloud Virtual Servers, and similar environments. application deployments are grouped across virtual machines using consistent metadata, or are inventoried individually if such metadata is absent. Once discovered, each Java application deployment is automatically evaluated using the Java Runtime library, which assigns resilience scores based on configuration and runtime characteristics.
- Scalability - Monitors thread usage and garbage collection activity to detect performance degradation under load.
- Security - Flags application deployments with insecure configurations, such as unencrypted ports, elevated root access, or weak session management practices.
- Availability - Evaluates restart frequency and failure patterns that may signal instability or recurring outages.
- Usability - Assesses error rates that may negatively impact user experience or application deployment responsiveness.
Note: The scoring criteria for the following requirements in the Java Runtime library - Sessions Exceeding Timeout, Session Activity, and Garbage Collection Activity are predefined by Concert. These criteria cannot be modified through the user interface.
Message queue library
Introduced with Concert 3.0, this library contains requirements that provide a standardized framework for evaluating the health, performance, and reliability of message queue systems used in modern application deployments. Message queue services such as Kafka, RabbitMQ , and similar platforms play a critical role in enabling asynchronous communication between distributed services. However, monitoring these systems is often fragmented, tool-specific, and lacks consistent scoring criteria. This makes it difficult to detect bottlenecks, ensure reliable message processing, and maintain predictable system behavior across environments.
The message queue library addresses this gap by providing a unified and extensible model for assessing message queue systems using standardized requirements, metrics, and scoring thresholds. It enables organizations to evaluate message flow, consumer behavior, and queue performance in a consistent and comparable manner across platforms.
- Observability : Monitors key indicators such as message throughput, queue depth, dead-letter queue (DLQ) behavior, and consumer activity to ensure visibility into system performance and message flow.
- Availability : Evaluates the reliability of message delivery and processing to ensure that messages are consistently consumed and acknowledged without disruption.
- Scalability : Assesses how efficiently the system handles varying message loads by analyzing publish rates, consumer utilization, and queue growth patterns.
@suman, can you please confirm if this note is making sense or not. This was mentioned by you in our call earlier!
Use this library to assess the resilience posture of message queue systems by identifying bottlenecks, monitoring message flow stability, and ensuring efficient message processing across producers and consumers. It helps surface actionable insights into queue performance and reliability, enabling teams to detect issues early, optimize system behavior, and maintain stable, predictable message-driven architectures.
Custom library
- Availability
- Integrity
- Maintainability
- Observability
- Recoverability
- Scalability
- Security
- Usability
- Importing a library package: Upload a
.zip,.tar, or.tgzfile containing requirements and metrics. This is ideal for teams that maintain their own standardized scoring models or want to reuse libraries across environments. - Creating a library manually: Define the library name, description, categories, requirements, target scores, thresholds, and metrics directly through the UI. Once a custom library is created, it becomes available for use when building resilience profiles, which in turn are applied during posture assessment plans to generate application deployment resilience scores.
Needs review