Well-Architected: Performance

Overview

The performance pillar focuses on the design, development, validation and operations of solutions that meet their non-functional requirements for solution's performance (typically associated with response times), capacity (supported load magnitudes, user base and achieved throughputs) and scalability (ability to organically accommodate the variable demand and increasing loads). Unlike a 'traditional' computing environment comprised of fixed capacity infrastructure, a hybrid cloud environment enables solutions to dynamically scale their capacity and resource consumption up and down as demand waxes and wanes; provided the solutions are architected to take advantage of these capabilities.

Performance analysis also allows to improve user experience via evidence-based improvements of the product design, and to achieve business goals through the built-in scalability and capacity.

Principles

Gather user expectations at the product definition stage, quantify business requirements and use this as the basis for later product architecture and design

Components within a well-architected solution can be scaled independently, for example add another instance of a service, but adding or removing components can have knock-on effects in other parts of the solution; for example adding another web server to service a spike in web traffic may require more message queues to communicate with back-end services. Knowing scaling dependencies beforehand can help to understand the solution's operating behavior and to avoid resource exhaustion through over-scaling of a single resource

Well-architected hybrid cloud solutions take advantage of multi-platform architectures to scaling and bursting strategies to take optimize performance as well as security, operating cost, and end-user expectations; for example run workloads on on-premises infrastructure with strong performance guarantees and fixed operating costs and burst / scale to a public cloud service provider during times of peak workload.

Moving data is expensive. Well-architected solutions take advantage of the portability and mobility of containerized workloads place services as close as possible to the data they consume

Solutions must chose the right platform and resources to maximize the value of their architectures. A hybrid cloud solution is able to span multiple clouds, including on-premises infrastructure, which gives architects the freedom to select the optimal resource mix to meet the performance needs of their solution

Solution Design Practices

Performance must be 'built in' to a solution at the outset of its design. Leaving performance considerations to the end of the solution's design, or worse, implementation often results in sub-optimal performance that can't be fixed without revisiting large portions of the solution's architecture. The Solution Design practices help architects to create solutions that are highly performant, and to avoid design approaches that can limit solution performance.

Solutions should be designed to grow or reduce their processing capacity by adding or removing discrete units (servers, services, network interfaces, and so on) rather than changing the capacity of existing units, for example by adding more CPUs to a server. To achieve this, solutions should adopt the following architecture principles:

Stateless components are components that don't retain client or session state (for example a user identity, or the data inputs provided on the earlier call) between interactions, eliminating dependencies between clients and any one specific instance of a component. This lack of dependency between components and their consumers means the solution can scale up and down by adding or removing component instances without impact to the consumers of the components' services. A good example of a stateless component is a cashier at the grocery store; as long as there is at least one cashier available shoppers can checkout and pay for their groceries and cashiers can be added and removed as the volume of shoppers demands. The converse of this is if shoppers were assigned a specific cashier at the beginning of their shopping trip. If the assigned cashier becomes bogged down or, worse, unavailable the shoppers have to wait or start all over and the overall performance of the grocery store (measured in shoppers per hour) suffers.
Avoid long-running tasks. If a solution must support long-running tasks (for example performing a complex scientific calculation) the task should be designed to support scale in or out through an interruption and checkpoint facility that enables the solution to shutdown and restart the task as resources are added to and removed from the solution.
Data at the ends. Stateless components can, in theory, scale infinitely and are reusable between clients. Highly performant solutions push state, that is user and application data, to the client application and databases at the ends of the solution architecture and keep no state in the intervening architecture layers.

Resources:
Stateful vs Stateless

Designing solutions as a set of highly cohesive, loosely coupled components enables components to be scaled independently relative to the demand for the service they provide. Architecture approaches such as service-oriented architecture, and microservices embed this practice as a core design principle, that is, a set of highly cohesive services communicating through high-level, loosely coupled APIs.

Moving data between the components of a solution is often the most time-consuming element of a transaction. Components should be designed to optimize the frequency and volume of communications for the bandwidth available. For example, an application that makes repeated calls to retrieve individual values from a database might perform 'well enough' when deployed on a local network, but might lag when the database component is relocated to a cloud service provider.

The representational state transfer (REST) architectural style that's commonly used in web-based applications is a good example of the type of balance that's exhibited by a well-architected solution; the full representative state of a resource is transferred as a JSON, XML, or other document that balances the amount of information that's transferred against the high latency of a web-based interaction.

Caching helps to limit the demand on resources and services that produce data. Consider using caching for long-lived, relatively static data, and/or data that's 'expensive' to produce. Well-architected solutions implement caching mechanism at all layers of the solution's architecture; placing caches as logically close as possible to the consumer to limit consumer-cache communications and improve overall response time.

Architects need to keep in mind that caching can be overdone. Poorly designed caching mechanism or a too large of a cache can adversely affect the solution's overall performance. Architects should evaluate the caching type and strategy and then measure the cache effectiveness during performance testing and analysis.

Asynchronous messaging using message queues, callback models, or other means enables solutions to both scale efficiently and to gracefully degrade under load if resources are exhausted. Well-architected solutions take advantage of asynchronous communication, particularly message queues, to give end-users with a responsive user experience and to avoid 'losing' user requests if a component fails. This same mechanism can also be used to interconnect systems that have different service levels or operating hours; for example a 24x7 web application connected to a 9 to 5 system of record with a message queue enables the web application to accept end-user requests even when the system of record is unavailable.

Solutions change over time and their performance can change along with them. Building in performance instrumentation that allows development, testing, and operations teams to non-intrusively collect application's performance metrics helps to develop and test a robust product that uses evidence-driven methods. The instrumentation also helps in functional testing, defect analysis and is an invaluable aid to maintaining the performance of a solution and helping to pinpoint the sources of performance problems in production. The configurable and non-intrusive instrumentation supports product monitoring, ensuring the observability of the solution in operations thus supporting the DevOps and SRE teams.

Planning and Testing Practices

Performance planning, testing, and analysis are a set of practices and approaches that are applied to the IT solutions with the purpose of ensuring the solution's quality and ability to achieve the expected business outcomes.

Usually the analysis applies to such quality attributes as solution's performance, capacity, scalability, and some aspects of the availability, business continuity, and sustainability in general. The analysis includes identification and quantification of the quality-related business requirements, design, and execution of the tests to retrieve specific metrics that reflect how the solution performs against a set of expectations such as response times, throughputs or supported loads.

Also, in a wider sense, the performance scope includes analysis of a solution's capacity, the total units of work the solution can service, its scalability (how well it responds to changes in demand). Performance analysis is also there to prove that the products remain functional and stable at extreme conditions of operation. It is expected that the goal of the performance analysis isn't just to capture the picture of the solution's performance, but to pinpoint the bottlenecks and to collaborate with the stakeholders to improve the solution's quality and usability.

Considering the complex and holistic nature of the product performance and capacity management, it should span across the different phases of the SLDC, from the product design to operations support and SRE. This ensures both proper customer requirement management, early issue detection and quick response to the production incidents.

From the business perspective, performance of the solution as a whole is important, which must be reflected in holistic non-functional requirements. However, for the unit-level performance tests, early performance tests implementing shift-left paradigm and for the root-cause analysis of performance issues one may need to specify low-level requirements, limiting durations of individual calls, network latency and so on.

So, the high-level performance requirements are typically provided at the process, or transaction level, for example "The loan origination process should complete in less than 2 minutes", without regard to how the performance of the individual process steps and subprocesses contributes to the final result. Creating a performance budget that allocates targets to each step within the process at the development stage provides measurable targets for feature development teams, helps to identify potential problem areas, and helps to focus performance optimization and remediation efforts where they are most beneficial.

Performance budgets should consider all the solution's layers from hardware all the way to application code. Leaving out any one of these risks the solution being unable to meet user expectations.

When testing performance, the proof is in the pudding, that is, only by testing the end-to-end solution can we be assured that it meets its requirements. This reflects the holistic nature of performance analysis. Testing of individual components (for example database, middleware, and so on) provides valuable insight into the performance analysis of a solution and helps to pinpoint the performance bottlenecks. But the component testing alone isn't enough, as the interactions between the components can lead to unexpected bottlenecks or other impediments that can result in suboptimal results.

Focusing on user perception means to focus mainly on perceived response times and overall responsiveness of the user interface. The capacity degradation is usually invisible to users until the product performance is impacted. A 1968 study found that there are three distinct orders of magnitude in human-computer interactions:

A response time of 100 ms is perceived as instantaneous. Humans have an average reaction time of 250 ms so anything less this is perceived as very fast/instantaneous.
Response times of 1 second or less are generally fast enough for users to feel that they're not being slowed down by the system's performance.
Response times over 10 seconds caused the users to completely lose attention.

From this it was posited that a response time of 2 seconds would be ideal, and thus 2 seconds or slightly more is a good response time target for hybrid cloud solutions, where possible. Of course, users' expectations depend on what it is they're doing, for example, no one expects a button push animation to last 2 seconds, but 2-3 seconds is a good general target for the user-facing applications.

Extending from this principle, solutions should incorporate user response time testing as part of their quality assurance cycle by using user interface testing tools. As much as the API call latencies are important for the overall product performance, the user-perceived performance is the key in attracting and retaining the user base.

Users often have different expectations of what constitutes 'good' performance. For example, a 'power user' who uses an application several times a day on a daily basis has very different performance expectations than someone who uses the same application maybe once a month. Users are also often challenged to quantify what 'good' performance means to them, often getting stuck on requirements such as "fast enough" (which is a hard target to meet). Also, an individual perception of the user-facing product performance is different for different people. For example, for some, a 10-second login time is acceptable (especially if this is a singleton event), for others it might be way too slow (especially if login is a frequent part of the workflow).

To help quantify and manage user expectations, it is recommended to:

Create the non-functional requirements with a good knowledge of the user base and its typical application usage patterns.
Contact users early in the solution design cycle to understand which features they use often and expect high responsiveness, and which features they use less frequently and thus can tolerate slower response times
.
Use percentiles to define average or median response time thresholds. This allows for some inevitable random variations of the product responsiveness and ensures that few outliers don't cause the overall product acceptance to fail.
Include realistic response time testing and feedback in early solution releases and previews. Performance testing left to the end of a solution's development can sometimes result in teams being unable to address performance concerns without having to 'undo' significant parts of the solution's architecture. Fixing the performance issues late in the development cycle is expensive.
Make sure the UI design includes such elements as spinners and status bars, so the user is aware that the application is alive and performing. This allows to avoid unnecessary frustration of the product's perceived slow performance.
If necessary, conduct research on similar and 'best-in-class' solutions, analyze industry trends and publications, and interview subject matter experts to develop appropriate response time and capacity targets.

Monitor and communicate performance boundaries to clients.

Misuse or misconfiguration of a product or solution component can be a source of poor performance and negative user experience. To avoid this situation, architects should:

Be aware of performance constraints within the solution and proactively communicate them to the users. For example, if a solution uses a slow / low-bandwith communication channel the architect should make end-users aware that downloading of high-resolution images will suffer.
Enable systems to detect and communicate when requests are outside of the solution's design parameters where possible. For example, have the system actively warn users when they attempt to download large files over a slow channel.

A common approach to performance testing is to test that the solution meets its response time targets at the expected maximum load for the solution; the assumption being that if the solution performs well at the maximum expected load that it will be good for loads less than that. The challenge with this approach is that it only provides one data point per tested peak load, almost like a pass/fail exercise.

A well-architected solution is tested using the exploratory approach for its responsiveness across a range of loads varying in size, mix of user types, and mix of functions tested. This provides the solution team with valuable multi-faceted information on how the solution components interact to affect performance, potential bottlenecks, and how to scale the solution to address lesser or greater workloads.

This approach allows to avoid the additional testing if the expected load targets change and there is a need to gather performance metrics under different load conditions. Inter/extrapolation of the existing dependencies of performance on the load magnitudes allows to calculate performance metrics for any load within the spectrum of the initially tested loads (from zero to the breakpoint magnitudes, and above).

Typical approach to performance testing follows the simplified pattern "Measure response times at given loads/throughputs". This approach answers the question of whether the key transactions' response time at peak magnitudes satisfy the existing service level agreements (SLAs). And typically the tested magnitudes are limited to "low", "peak" and "stress". This approach can answer the questions about the response times at typical loads, but it doesn't give the full picture of the system's performance in all possible life situations.

The more advanced *Exploratory Performance Testing* approach targets creation of the *Performance Snapshot* of the tested solution. The Snapshot includes a comprehensive set of performance metrics, gathered at the widest supported spectrum of loads - from a single-user measurements to the post-breakpoint loads (when possible, except when the system crashes). This includes a collection of the transactions' response times, transaction and data throughputs and resource consumption data gathered at incrementing load conditions. By "transaction" we understand a finite piece of work by the system - from macro transactions like Login or Update Account, to individual sub-transactions (like authentication call within the Login transaction), or simple http hits.

The comprehensive set of performance data, the Performance Snapshot, includes the above performance metrics for the low-load "linear" load range (where individually processed threads don't feel each others' presence and response times don't increase with the increasing load), "non-linear" range where the response times grow as the load increases, the saturation points, where the throughputs stop increasing with the incrementing load and reach saturation levels, and post-breakpont range where performance declines after the throughputs reach maximum and the response times exceed SLA levels.

Covering the full range of loads usually doesn't require more effort from the testing teams than the mere testing at the "peak" and "stress" magnitudes and the endurance tests, as the same test scripts are used (and the main effort usually is to create those scripts). But the advantages of creation of such performance snapshot are as follows:

There is no need to spend time and effort trying to guess the right test setup which produces the 'right' transaction throughput ("peak" or "stress"), you just increment your load and cover *all* the supported load magnitudes and throughputs.
One can establish at what loads the response times begin to grow, where they exceed the SLA levels, and where the throughputs reach maximum levels.
This allows to directly measure the system's capacity, for example the load magnitude at which the breakpoint condition is achieved (response time exceeds SLA, or the throughputs achieve maximum levels, or the system resource usage gets into the specified 'red zone', for example CPU usage reaches 90%, or the system crashes - whatever happens first). This means there's no need to use the typical guesswork-based approaches to the capacity analysis and planning.
In case the expected conditions of the production operation change (for example the expected average and peak loads are redefined by the business), there's no need to re-run the performance tests: the sought performance metrics for the different load magnitudes can be simply obtained by inter- or extrapolation of the existing test results.
Covering all spectrum of loads instead of few predefined magnitudes allows to ensure that we have a full view of the system performance and don't miss any possible performance issues by not testing the product at the extreme conditions.
Ensuring that the tests include hitting the system performance breakpoints means we know where are the performance bottlenecks, which link, component or layer will fail first as the loads go up. This allows to provide a useful and evidence-based feedback to the architecture and design teams to improve the product's robustness and performance.

There are several types of performance tests that can be run on a solution. A well-architected solution makes use of them all.

Manual benchmarking is what its name implies, manually running solution functions to get a first-hand feel for how the solution responds to an user.
Calibration tests are tests that are conducted to compare the results of automated testing tools against other sources such as manual tests, or built-in performance metrics, to validate the correctness of the test scripts and results of the testing tool.
A soak test, or endurance test, is testing the solution under load for an extended time to ensure that the solution is stable under load and doesn't deteriorate with time, can be reliably operated, and exhibits expected resource consumption (that is, no memory leaks).
Peak tests test the solution under the expected peak workload, for example the busiest day of the year to assure solution stability and to gather key measures such as response time and resource consumption under the maximum expected load.
Stress and spike tests are used to test the solution at multiples (for example 2x or 3x) of the expected peak load over a short interval (a stress test) or even higher loads for a very short time (spike tests). These help to identify bottlenecks within the solution and to help the solution team identify the solutions scaling dependencies.
Variable-load tests, or breakpoint tests are used to test the solution under a range of loads to understand the solution's performance under varying load magnitudes and to help the solution team discover trends and resource limits within the solution. These tests also allow to register the product's breakpoints, measure the solution capacity and detect the product's weakest components (which are most likely to fail).

Aim to get the most of the variety of performance data acquired in tests:

Apply creative, exploratory ways to analyze the test results. Use the *what-if* approaches to explore additional scenarios.
Map performance results to environments of different size and architecture to predict the solution's performance in different platforms and deployments.
Catch any inconsistencies of the obtained performance data by detecting unexpected patterns and trends (for example *increased* load produces *reduced* transaction rates).
Use modeling techniques to link the measured system capacity to the corresponding size of the user base - based on the assumed usage patterns.
Always remember to assess the mutual performance impact of different workflows that can be executed concurrently in production.
Use different sources of performance data: from the pilot projects to the unit and functional tests, to DevOps and SRE golden signals. Any data has a value when it comes to the performance analysis.
Share more often the results of the analysis with stakeholders. It often helps to obtain a useful input from the others, improves the awareness about the progress of the performance testing and analysis, helps others to better understand such a *technical* area as performance and capacity analysis, and improves visibility of the overall non-functional validation efforts.

Resources

Spectrum Scale on IBM Cloud performance

Demonstrates how to set-up a high-performance storage sub-system on IBM Cloud using IBM Spectrum Scale…

What is performance instrumentation?

What is application performance management describes how application performance management (APM) tools…

IBM Cloud Application Performance

Performance Management uses agents and data collectors to collect data on the monitored hosts. Agents and data collecto…

Well-Architected Framework Pillars

Hybrid and Portable

Resiliency

Efficient Operations

Security and Compliance

Performance

Financial Operations and Sustainability

Next steps