Availability requirements
Concert's foundational library for assessing resilience includes a category of requirements related to the availability of a system and its ability to recover quickly from unexpected events.
You can improve your resilience posture by ensuring that your application or service has an established service level agreement (SLA) that is clearly documented in the service description. The SLA should include a definition of availability and how it is measured by using non-vague terminology.
- Availability goal for the overall application
This requirement defines a target percentage of application availability over a period of time. Typically, this percentage is based on a service level agreement (SLA) in which you are contractually committed to ensuring a highly available service.
- Availability goal by region
This requirement defines a target percentage of application availability for a specific region.
Each requirement requires you to provide several metrics, which are measurements used to assess the application's compliance with that requirement. Some metrics are human-entered, whereas others can be ingested automatically from connected systems.
I'd like to add more guidance to users about how to set up these NFRs (specifying target scores and rating scales), how to prepare this type of data, how to provide it to Concert, etc. For example, what are the user-editable variables in the code snippet for each NFR where the user would provide the input metric? The internal topic talks about things generally, but we need to map the concepts to the actual steps they must take.
These details came from Kevin's internal docs, but can expand on this a bit. Are we providing these recommendations just for general education or are the contents of their SLA documentation evaluated during the assessement?