Overview
In the context of software development and IT systems, a resilient application refers to a system that can recover quickly from unexpected disruptions or failures without impacting productive use. Resilience encompasses several characteristics that enable services to maintain functionality and integrity under challenging conditions. To define and achieve application resilience, organizations must consider many non-functional requirements (NFRs), which are a set of constraints or qualities that a system must have beyond its functional requirements to be considered resilient.
While many organizations implement processes or policies to improve their resilience, it is difficult to fully understand your resilience posture or to scale the application resilience. The practice of improving resilience requires an organization to continuously plan, define, configure, operate, evaluate, and act to address issues. Concert operationalizes the journey of achieving solution resilience and makes it easier to assess siloed data and drive outcomes.
Using the Resilience dimension, you can select the NFRs that matter most to your organization from an NFR library, define resilience goals (profiles and postures) for each application or environment, and provide input metrics to track progress over time.
- Availability - This NFR type is closely tied to resilience in that it refers to the proportion of time a system is operational and accessible for use. You can use several strategies to achieve high availability, such as implementing multi-level redundancy (failover), fault tolerance, load balancing, regular maintenance schedule, monitoring, and disaster recovery planning.
- Maintainability - This NFR type refers to how easily a system can be modified to fix faults, improve performance, or adapt to changes. High maintainability is critical to reducing long-term costs, improving system agility, and ensuring the system can evolve with business needs. It is achieved using strategies like modular design, high-quality code and documentation, automated testing, continuous integration and deployment (CI/CD), version control, and regular refactoring.
- Observability - This NFR type refers to how well you can understand the internal states of a system based on its external outputs to ensure it is operating efficiently, reliably, and securely. High observability is critical for debugging and troubleshooting efficiently, performance and security monitoring, and capacity planning. It is achieved using strategies like comprehensive logging, metrics collection and analysis, distributed tracing, monitoring tools, alerting, and regular review and improvement.
- Recoverability - This NFR type refers to a system's ability to return to a fully operational state after a disruption or failure. High recoverability is important to business continuity, data integrity, and the user satisfaction as it ensures your application can recover from incidents while minimizing downtime and data loss. It is achieved through strategies like backing up data regularly and having a tested restore process in place, establishing system redundancy, automating the recovery process, regular testing, and incident response planning.
- Scalability - This NFR type refers to a system's ability to handle increased load efficiently without a significant degradation of performance or an inordinate increase in resources. High scalability is important because it allows for growth accommodation and cost efficiency, while ensuring application performance even under heavy load. IT is achieved using common strategies like horizontal and vertical scaling, load balancing, use of micro-services architecture, caching, and others.
- Usability - This NFR type refers to how user-friendly and intuitive an application or system is. Usability is critical to user adoption and engagement and is a direct reflection of how well you understand and empathize with your users. There are many ways to improve the usability of your application, such as following a user-centered design approach, ensuring all user interfaces and resources are simple, intuitive, consistent, and accessible.
Based on the assessment, Concert generates a resilience score for each NFR and NFR type applied during the assessment, as well as an overall assessment score indicating the resilience posture of the given application or environment. The overall resilience score for each application is based on the completeness and quality of resilience data (input metrics) available to assess against the specified NFRs. Each NFR type is scored based on several NFRs, each of which is scored based on several input metrics. Each input metric is scored on a standardized 100-point scale based on standard scoring levels.
As you track and take action to meet the defined NFRs based on Concert resilience assessments, you improve the resilience posture of each application and of your organization as a whole, ensuring continuous application availability and performance despite unexpected or challenging events — from something as small as a code change request or patch to large events like cyberattacks, natural disasters, or economic disruptions.