It is recognized in the Agile community that, compared to a waterfall project, there is an even greater focus on the functional aspect, which often leave the non-functional aspect as an afterthought.
The aim of the implementation is to alert developers when their code check-in has had a significant performance impact on the build. It also results in regular performance measurements, allowing for trending over time and the analysis tools to detect more subtle performance degradation. The following tangible benefits are achievable with this implementation:
Cost savings in detecting, analyzing, and fixing the vast majority of performance defects.
- Limiting the scope of where the bug was introduced, improving the efficiency of analysis.
- Reducing the number of performance defects reaching performance test environments, where detection is more costly. Defects detected at this stage often block performance testers.
- Quick defect resolution, shortening the feedback loop from performance bug detection to analysis, fix, and retest.
Improved code quality
- Improving visibility to developers of the impact of low-performing code, which results in a greater code quality ethos
This implementation fits well within a pipeline delivery/continuous integration model in an agile project.
Role of the agile build lifecycle, an example project
In this example project, code is developed within an agile methodology, consisting of two-week sprints. Refer to the developerWorks knowledge path, "Agile software development," for a greater understanding of the agile methodology. Developers check in code several times a day which creates several different builds of the project's code. The continuous integrations server builds and promotes code containing a few developers' check-ins, hourly.
The concept behind pipeline delivery is that each check-in could be a shippable product, safe to be deployed into a production environment. Achieving continuous pipeline delivery relies on having a rigorous promotional model containing various automated test and validation phases between the check-in and deployed code as shown in Figure 1.
Figure 1. Agile build lifecycle - pipeline delivery (continuous integration)
Deploying a developer's check-in straight into a production environment without any human validation is unlikely to ever be completely safe. However, with automated functional and non-functional testing and analysis, you can get very close and minimize the overhead of manual work required to ensure a release. Automated non-functional testing also improves the visibility of security and performance regression, and the transparency of introduced performance and security threats by detecting them hourly, in line with the project's build model, instead of in line with the exploratory testing model, whose frequency is usually less often than per release.
It is common to see rafts of unit tests on complex integration projects that check for functional regression on newly checked-in code. It is less common to see full end-to-end functional testing, but the practice is still common. It is much less common to see automated non-functional testing. The following section discusses how non-functional tests, more specifically performance tests, are automated and implemented into the Agile build lifecycle.
Method for implementing performance into the agile build lifecycle
In this example, the following open source tools are used or integrated within the agile build process:
- Git: Source code version control tool
- Apache Maven: Build automation tool
- Jenkins: Continuous integration tool
- Junit: Unit testing framework.
- Apache JMeter: Performance testing tool
- JMeter-Mavin plug-in: Automates JMeter tests in Maven.
- Jenkins xUnit plug-in: Build validation based on test unit results.
- Jenkins Performance plug-in: Integrates Jenkins with Perforce SCM Repositories.
- Performance-nfr-compliance-plugin: Bespoke plugin for maven for analyzing performance compliance.
Jenkins is used to build, deploy, and promote the code (see Figure1 for the pipeline delivery process). Hourly, Jenkins schedules the Maven build of the project code. After the code is built and unit tested, Jenkins begins the promotional steps for the build to complete in order to be considered stable and ready for deployment. First, environment checks are performed and the code is then deployed to a development environment(s). End-to-end "functional tests" are then triggered. After the functional tests, non-functional tests are performed on the build of code, described in Figure 2.
Figure 2. Automated performance compliance testing within the build pipeline
Jenkins triggers the "performance-test" job, the focus of this article. This is a Maven project, containing plug-in configuration for the JMeter-Maven-plug-in, performance-nfr-compliance-plug-in and JMeter test scripts. The JMeter-Maven-plug-in configuration is set not to fail the build on request failures/errors because this causes the Maven process to exit and subsequently prevents Jenkins from being able to analyze the results of the performance test. Each JMeter test script replicates the users journey tested in the performance test environment. This occurs after Jenkins promotes the build. Additional scripts have been written to accommodate commonly executed functionality that had not yet been exposed by existing user journeys.
The JMeter-Maven-plugin configuration is set up to expose JMeter parameters, such as number of threads (users) and ramp-up time as Maven parameters, which can then be accessed through build parameters in Jenkins. The JMeter-Maven-plug-in is triggered to run the performance tests one at a time and measure the corresponding response times, stored in the standard JTL JMeter results format. An example implementation of the JMeter-Maven-plug-in is available on GitHub. See Resources for more information.A user journey (thread group) has 60 threads and a ramp-up time of 60 seconds. Each thread group has a corresponding set-up thread. The set-up thread starts first in order to prime the environment with new users in the required state for use in the actual test thread. This approach reduces "test flakiness" (inconsistent test failures that are not a result of a bug) because it does not rely on existing user data in the development environment.
After the JMeter test(s) complete, the performance-nfr-compliance-plug-in calculates the ninety-fifth and fiftieth percentiles for each of the required HTTP requests from the JMeter results file. Compliance is measured against that as specified in the pom configuration. For example, take a test with a sample of 100, where the log-in request has a 1000ms threshold applied for the 95th percentile. If the "login" request takes 500 ms for 94 users and 1005 for 6 users, the 95th percentile is 1005ms, and the request marked as failed. This evaluation generates the compliance result in an XML report in the junit report template. This evaluation also produces an HTML page of a RAG status for the test set, with pass/fails marked against each percentile criteria and the test as a whole.
The Jenkins xUnit plug-in fails the build if the JUnit test report produced by the nfr-compliance-plug-in had a percentage of JUnit failures (criterion non-compliances) above the failure threshold. The Jenkins Performance plug-in fails the build if the JMeter results have a percentage of sample failures/errors (for example, response code of 404) above the threshold. This sampling helps to analyze whether or not a build is marked as failed because of a genuine performance issue. Non-performance issues that may cause test failures include out-of-date tests with the current version of the code and environment issues. XUnit test failure and JMeter sample failure thresholds are specified in the Jenkins job configuration.
The Jenkins performance plug-in then plots, for each build, the graphs shown in Figure 3. These can be used to manually observe degradation or improvement in performance for a build.
Figure 3. Jenkins Performance plug-in graph
Frequent performance checks such as those described in this implementation provide large quantities of valuable performance data. Less frequent analysis of this data should be performed to deduce and understand more subtle performance issues.
General guidance for implementation
Similar custom implementations can be achieved using different tools. There are equivalent JMeter plug-ins for other build tools, for example Ant. JetBrains' TeamCity is a performance plug-in for continuous integration tools.
Non-functional tests being run as a promotion step in the build pipeline means that they will run several times an hour among other intensive functional/non-functional tests, in line with the "continuous integration" (CI) code builds. This means that these tests must be fairly quick and as flakeless as possible. Ensure that your implementation satisfies and is conscious of the following:
Non-production-like environment: Ensure that the percentile thresholds, number of users, and concurrent tests account for running tests within a non-production-like environment.
Build failure frequency: The implementation will lose its value if it breaks the build too often. Ensure that the thresholds of the calculated percentiles for the response times are not too tight to the expected results. Tighten the thresholds after some manual analysis of the response times for each build to ensure a build failure means that you have detected degradation in performance. Similarly, be forgiving when setting xUnit and Jenkins Performance Plug-in thresholds within the continuous integration tool.
Ensure that you have set appropriate HTTP client time-outs in JMeter to prevent the build from hanging, and handle errors appropriately.
Test flakiness: Frequent code changes can cause end-to-end tests to fail often. When marking a build as failed, we want to quickly negate the reason for failure as a failure of the jmeter test script as a result of a change to application functionality. Examples of an application change that may cause your jmeter script to fail include URL changesor changes to the implementation of tokens or sessions. Invest some time in hardening your scripts by adding assertions or post processors that will validate the response from the server.
Depending on the vigilance of your developers, a 200 response code might not always be a successful response in the context of the user journey you are measuring. Where you have detected that there is a failure, update the response from the server and log to the jmeter log some custom logging that will help you quickly debug when you are under pressure to turn the build green.
The value of this approach is in drawing attention to those performance issues that can be detected during continuous integration almost immediately after their introduction, rather than their being discovered much later in the lifecycle at the end of a sprint or release; or even later. Narrowing the scope of the introduction or exposure of the performance degradation to a few commits rather than an entire release causes the process of discovery, analysis, and resolution to be far more efficient and less costly process.
As such, the techniques described here should be regarded as a complementary activity to traditional performance testing in the context of an overall Performance Engineering Strategy for the project. IBM's PEMM methodology describes this wider context in Chapter 2 of Performance Modeling and Engineering. See Resources for more information.
- Read all about the Maven Apache Project.
- Learn all about Jenkins.
- Explore JUnit.
- Learn all about Apache JMeter.
- Read all about the Jenkins xUnit Plugin.
- Read all about the Jenkins Performance Plugin.
- Read about the GitHub example of a plugin.