Libraries

Edit online

In the Resilience dimension, you can reference a library of Concert-provided requirements. The contents of each library are organized by category, and including metrics, weights, and the calculation used to generate a resilience score for each application deployment.

Go to Dimensions > Resilience and click the Libraries tab to view five prebuilt resilience libraries.

Foundational library
Container Build Integrity library
Runtime Production Readiness library
VM Integrity Assurance library
Java Runtime library
Custom library
Message queue library

When you create a resilience profile, you will reference requirements defined in one or more libraries to define which requirements to apply when assessing the resilience of your application deployment.

Note: The requirements in each library assess the aggregated application deployment level-data, not each component, or entity of that application deployment. For example, a requirement may assess the average image size, number of non-HTTPS services, or the percentage of services without CPU limits, etc. Assessing aggregated data provides an overall resilience check for the application deployment as opposed to other monitoring tools that assess only individual entity metrics.

Foundational library

Introduced with Concert 1.0.5, the foundational library is a core capability within Concert's resilience feature, enabling organizations to systematically measure, score, and improve application deployment resilience. It is built around site reliability engineering (SRE) best practices that are related to application deployment performance, incident management, and release management.

Existing approaches to assessing application deployment resilience are often fragmented, tool-specific, or lack clear scoring criteria. This makes it difficult to benchmark or drive continuous improvement across teams and services. The foundational library addresses this gap by providing a unified, extensible framework for resilience measurements that integrate seamlessly with existing observability and ITSM tools, while also accommodating manual reporting where automation is not feasible.

The foundational library provides a comprehensive, standardizes set of over 22 requirements that span the full spectrum of operational resilience, which is grouped into six key categories: Availability, Maintainability, Observability, Recoverability, Scalability, and Usability.

Availability - Tracks overall and regional application deployment availability using synthetic tests, error rates, and downtime analysis.
Maintainability - Evaluates deployment practices, automation coverage, run book completeness, and change management hygiene.
Observability - Measures incident detection, diagnostic, and acknowledgment times, as well as the percentage of incidents detected by monitoring.
Recoverability - Quantifies mean time to mitigate and recover, as well as recovery time and point objectives (RTO/RPO) and rollback readiness.
Scalability - Assesses throughput and visit rates during scaling events, ensuring performance under load.
Usability - Focuses on user-facing metrics, such as latency and error rate SLOs, as well as blast radius management during incidents.

Each requirement is defined with clear measurement units, detailed descriptions, and is mapped to leading and lagging indicators sourced from automated tools, such as APM and ITSM systems, as well as manual inputs.

You can use this library to generate objective baselines and compare resilience across application deployments, business units, and time periods. It can also help provide actionable insights for targeted, continuous improvements in operational practices and technical architecture. The unified reporting simplifies resilience assessments for technical and non-technical stakeholders using data from multiple sources, ensuring the metrics are transparent and aligned with industry best practices and regulatory requirements.

Ingest metrics from APM tools, such as Instana, to gather real-time telemetry, error rates, throughput, and synthetic test results. You can also ingest resilience metrics from ITSM tools like ServiceNow for data related to incidents, change, and recovery, including time-to-detect, time-to-recover, and incident root cause analysis.

In the foundational library, metric are categorized as a leading indicator or a lagging indicator. Leading indicators are predictive metrics that help predict future performance, whereas lagging indicators are measurements that are collected from past outcomes.

Container Build Integrity library

Introduced with Concert 1.1.0, this library contains requirements that reflect a standardized framework assessing container image quality across security, efficiency, and maintainability dimensions. It combines static analysis with build process telemetry to provide actionable improvement scores.

Hidden risks in container build pipelines lead to bloated images, security vulnerabilities, and production inefficiencies. Using this library to assess resilience can help answer questions like, Are our builds truly production-grade? You can use the requirements in this library (via resilience profiles) in conjunction with resilience data ingested using Concert Workflows to automate vulnerability remediation.

The Container Build Integrity library provides a set of 14 requirements across the following four categories for assessing container image quality:

Integrity - Validates that agreed-upon standards are adopted when building, deploying, and hosting application deployments.
Scalability - Assesses throughput and visit rates during scaling events, ensuring performance under load.
Security - Aims to protect the system and its data from unauthorized access, use, disclosure, disruption, modification, or destruction due to malicious attacks, data breaches, or various threats. It assesses the confidentiality, authenticity, and availability of data in your application deployments.
Draft comment: erin.pelkey@ibm.com
Needs review
Maintainability - Evaluates deployment practices, automation coverage, run book completeness, and change management hygiene.
Availability - Evaluates whether containerized services are resilient and restartable during failures or outages. Metrics in this category measure conditions such as missing restart policies, which may prevent container recovery and continuity during disruptions.

Use this library to help detect secrets and other vulnerabilities at build time versus runtime, supporting a proactive, early security approach to vulnerability management. This can help you also optimize build efficiency by reducing the image size and build times through layer analysis and enforcing compliance by automating best practices from CIS benchmarks.

Runtime Production Readiness library

Introduced with Concert 1.1.0, this library contains requirements that reflect a standardized scoring framework evaluating containerized workloads across reliability, security, and Kubernetes best practices. It combines runtime behavior with deployment configurations to produce actionable readiness scores.

Fragmented container observability leads to reactive incident management, hidden reliability risk, and an inability to provide Kubernetes compliance. Using this library helps answer questions like, Are our containers truly production-ready? through numerous critical dimensions of operational health and to create automated low-code workflows to remediate poorly orchestrated deployments.

The Runtime Production Readiness library provides a set of 64 requirements across the following six categories to assess the reliability and security of your containerized workloads, as well as their adherence to Kubernetes best practices:

Availability - Tracks overall and regional application deployment availability using synthetic tests, error rates, and downtime analysis.
Integrity - Validates that agreed-upon standards are adopted when building, deploying, and hosting application deployments.
Maintainability - Evaluates deployment practices, automation coverage, run book completeness, and change management hygiene.
Recoverability - Quantifies mean time to mitigate and recover, as well as recovery time and point objectives (RTO/RPO) and rollback readiness.
Scalability - Assesses throughput and visit rates during scaling events, ensuring performance under load.
Security - Aims to protect the system and its data from unauthorized access, use, disclosure, disruption, modification, or destruction due to malicious attacks, data breaches, or various threats. It assesses the confidentiality, authenticity, and availability of data in your application deployments.
Draft comment: erin.pelkey@ibm.com
Needs review
Usability - Focuses on user-facing metrics, such as latency and error rate SLOs, as well as blast radius management during incidents.

Use this library to improve container health scoring by quantifying adherence to best practices and security controls for containerized workloads. This library also helps quantity tolerance for node failures and outages and automates verification of key security controls from market leaders' best practices.

VM Integrity Assurance library

Introduced with Concert 1.1.0, this library contains requirements that reflect a standardized scoring system for evaluating virtual machine (VM) health across security hardening, operational efficiency, and compliance adherence. It combines infrastructure telemetry with security postures and is designed to address the fragmented VM monitoring problem by providing a comprehensive approach to monitoring security, performance, and compliance metrics.

The unified scoring system provided by the library answers the question, How healthy are our VMs? by offering quantifiable benchmarks that allow organizations to proactively identify and address issues before they become major problems. This reduces the need for reactive incident management and minimizes hidden risk.

The VM Integrity Assurance library provides a set of 15 requirements across the following five categories to assess VM health as it is related to security, efficiency, and compliance:

Integrity - Validates that agreed-upon standards are adopted when building, deploying, and hosting application deployments.
Maintainability - Evaluates deployment practices, automation coverage, run book completeness, and change management hygiene.
Recoverability - Quantifies mean time to mitigate and recover, as well as recovery time and point objectives (RTO/RPO) and rollback readiness.
Scalability - Assesses throughput and visit rates during scaling events, ensuring performance under load.
Security - Aims to protect the system and its data from unauthorized access, use, disclosure, disruption, modification, or destruction due to malicious attacks, data breaches, or various threats. It assesses the confidentiality, authenticity, and availability of data in your application deployments.
Draft comment: erin.pelkey@ibm.com
Needs review
Usability - Focuses on user-facing metrics, such as latency and error rate SLOs, as well as blast radius management during incidents.

Use this library to combine technical metrics into one actionable score that prioritize your most critical VM risks first. You can demonstrate proof of compliance through audit trails that show continuous adherence to CIS/PCI benchmarks and track the state of VM degradation or improvement over time.

Java Runtime library

Introduced with Concert 2.0.0, this library contains requirements that reflect a standardized scoring system for evaluating Java workloads running on virtual machines (VM). It enables organizations gain visibility into application deployment health and risk posture without requiring agent installation or elevated access.

The scoring system supports automated discovery of Java application deployments such as Tomcat or Java servlets across Linux-based virtual machines accessible via SSH credentials. This includes workloads running on AWS EC2, IBM Cloud Virtual Servers, and similar environments. application deployments are grouped across virtual machines using consistent metadata, or are inventoried individually if such metadata is absent. Once discovered, each Java application deployment is automatically evaluated using the Java Runtime library, which assigns resilience scores based on configuration and runtime characteristics.

The Java Runtime library provides a set of 11 requirements across the following four categories to assess application deployment health, stability, and risk posture:

Scalability - Monitors thread usage and garbage collection activity to detect performance degradation under load.
Security - Flags application deployments with insecure configurations, such as unencrypted ports, elevated root access, or weak session management practices.
Availability - Evaluates restart frequency and failure patterns that may signal instability or recurring outages.
Usability - Assesses error rates that may negatively impact user experience or application deployment responsiveness.
Note: The scoring criteria for the following requirements in the Java Runtime library - Sessions Exceeding Timeout, Session Activity, and Garbage Collection Activity are predefined by Concert. These criteria cannot be modified through the user interface.

Use this library to evaluate the resilience posture of Java workloads discovered through automated VM scans. It surfaces valuable insights into aapplication deploymenthealth and risk posture, helping teams prioritize and address critical resilience gaps across Java-based services.

Message queue library

Introduced with Concert 3.0, this library contains requirements that provide a standardized framework for evaluating the health, performance, and reliability of message queue systems used in modern application deployments. Message queue services such as Kafka, RabbitMQ , and similar platforms play a critical role in enabling asynchronous communication between distributed services. However, monitoring these systems is often fragmented, tool-specific, and lacks consistent scoring criteria. This makes it difficult to detect bottlenecks, ensure reliable message processing, and maintain predictable system behavior across environments.

The message queue library addresses this gap by providing a unified and extensible model for assessing message queue systems using standardized requirements, metrics, and scoring thresholds. It enables organizations to evaluate message flow, consumer behavior, and queue performance in a consistent and comparable manner across platforms.

This library is designed as a generic framework and is not tied to a specific message queue implementation. It can be applied to any message queue service and extended further by creating custom libraries for platform-specific requirements. The Message queue library provides a set of 7 requirements across the following categories to assess message queue system health and performance:

Observability : Monitors key indicators such as message throughput, queue depth, dead-letter queue (DLQ) behavior, and consumer activity to ensure visibility into system performance and message flow.
Availability : Evaluates the reliability of message delivery and processing to ensure that messages are consistently consumed and acknowledged without disruption.
Scalability : Assesses how efficiently the system handles varying message loads by analyzing publish rates, consumer utilization, and queue growth patterns.

Each metric contributes to evaluating key aspects of message queue behavior, including throughput, backlog management, consumer efficiency, and message processing reliability. Based on these evaluations, generates contextual remediation actions through AI-powered analysis , providing requirement-specific recommendations to help address performance issues and improve system reliability.

Note: The message queue library is not currently integrated with auto-discovery workflows. You can create a resilience posture manually and run assessments to evaluate message queue systems.

Draft comment:
@suman, can you please confirm if this note is making sense or not. This was mentioned by you in our call earlier!

Use this library to assess the resilience posture of message queue systems by identifying bottlenecks, monitoring message flow stability, and ensuring efficient message processing across producers and consumers. It helps surface actionable insights into queue performance and reliability, enabling teams to detect issues early, optimize system behavior, and maintain stable, predictable message-driven architectures.

Custom library

Introduced with Concert 2.2, you can create custom libraries to define their own requirements and metrics when the default libraries do not fully represent your operational needs. This capability provides complete flexibility for modeling resilience scoring frameworks that align with internal standards, regulatory expectations, or environment-specific behaviors. A custom library follows the same structural model as the default libraries. Requirements are grouped under standard resilience categories:

Availability
Integrity
Maintainability
Observability
Recoverability
Scalability
Security
Usability

Within each category, teams can create any number of requirements and add the metrics that best represent the resilience indicators they want to measure. You can create a custom library in two ways:

Importing a library package: Upload a .zip, .tar, or .tgz file containing requirements and metrics. This is ideal for teams that maintain their own standardized scoring models or want to reuse libraries across environments.
Creating a library manually: Define the library name, description, categories, requirements, target scores, thresholds, and metrics directly through the UI. Once a custom library is created, it becomes available for use when building resilience profiles, which in turn are applied during posture assessment plans to generate application deployment resilience scores.

Custom libraries are particularly useful when organizations need to incorporate proprietary or domain-specific indicators, define internal SLO or SLA thresholds that differ from Concert’s defaults, or support application deployments that run on specialized platforms not fully covered by the standard libraries. They also enable teams to bring their own metrics through Concert’s ingestion APIs, allowing greater flexibility and alignment with environment-specific resilience practices.