Continuous integration is a hot topic in the software development world. Although often linked with Agile processes, such as eXtreme programming, the concept and practices are nothing new -- just a variation on well-known software development best practices. In this article, I will discuss what continuous integration is, where it fits into the software development lifecycle, and how it can be realized with a supporting toolset.
What is continuous integration?
Continuous integration is, first and foremost, a process backed by a set of tools. Martin Fowler and Matthew Foemmel define it as1:
... a fully automated and reproducible build, including testing, that runs many times a day. This allows each developer to integrate daily, thus reducing integration problems.
To illustrate that continuous integration is really nothing new, here is a generic definition that Grady Booch supplied in the second edition of his book, Object-Oriented Analysis and Design with Applications2:
The macro process of object-oriented development is one of "continuous integration"...At regular intervals, the process of "continuous integration" yields executable releases that grow in functionality at every release...It is through these milestones that management can measure progress and quality, and hence anticipate, identify, and then actively attach risks on an ongoing basis.
To me, perhaps a more tangible and complete definition of continuous integration would be:
- A central repository for all members of a team, containing:
- the latest code (at least)
- the latest executables
- An automated process for building and testing all project assets that:
- can be run many times a day
- is self sufficient
It is best to think of continuous integration as a mindset or statement of intent that allows you to reduce risk by frequently integrating incremental software development changes. Basically, it represents the realization and refinement of a common software development best practice: the daily build and smoke test.
Continuous integration in context
Normally, you can apply three types of build profiles to any system or application, as illustrated in Figure 1 below.
Figure 1: Three types of system builds
A private build is carried out manually, by individual developers in their own workspaces, to build and unit test the components they are developing. An integration build, often manual, is carried out by an integrator in a central workspace. In continuous integration, to facilitate frequent change integration, this build is by necessity automatic and executed by a tool (such as CruiseControl).3 Usually, results of continuous integration builds are not deployed to customers or production environments; instead, an additional release build is executed manually and with greater care. For example, release builds often require that you identify exactly the inputs (i.e. change requests, defects) to the build, and execute the build in a separate, controlled environment (i.e., a separate workspace). This allows you to continue integration builds at the same time and to record the release build's exact contents for auditing and traceability purposes.
Build processes usually have a number of stages (see Figure 2) that may execute seamlessly but have different objectives. At the beginning, there is usually some form of input identification to the build to define the scope of what must be built (continuous integration assumes that all inputs -- or all checked-in files -- will be built). Next, there is some form of script, defining how the build inputs are to be combined together to produce the desired output. Common examples are
makefile for Make and
build.xml for Ant. This build script is then executed, either manually or automatically. Finally, the build results are reported, either through a simple log file or a more detailed mechanism. Obviously, any build process also needs an environment in which to run. This is usually the supporting software configuration management (SCM) environment, which provides software versions, baselines, and tracking.
Figure 2: Stages in a build process
Often, each of these stages is realized by a tool or combination of tools. Figure 2 shows a common set of tools for the Java domain.
Realizing continuous integration with CruiseControl
To achieve continuous integration, you will need some tools, including a software configuration management tool, such as Rational ClearCase; a build tool, such as Ant, Make, ClearCase clearmake,4 custom build script, or the like; and some tool to automatically execute and report on the results of your build. A common tool for this purpose is CruiseControl. The infrastructure for a typical CruiseControl implementation is illustrated in Figure 3 below.
Figure 3: Infrastructure for a typical CruiseControl implementation
Figure 3 shows a number of entities:
- Developer desktop, where the developer makes code changes.
- SCM repository, which stores the team's collective changes.
- Integration build server, which contains the CruiseControl application and where the build is executed.
- Development application or Web server, if the application being developed can be deployed to a Web server or runtime container.
The process that CruiseControl follows by default is to monitor the SCM repository for changes; with Rational ClearCase, it checks for check-ins on a branch or stream (usually the project's integration stream). If changes are found, then, on a schedule, CruiseControl builds your application via your existing Ant or Maven scripts, runs your JUnit Test suite, and reports on the results of your build. It can send these results automatically to a set of users via email (as shown in Figure 4) and/or hold them centrally on a build-results Web site. Optionally, CruiseControl can also deploy your application and carry out any additional scripted task you require.
Figure 4: Email with CruiseControl build results
The CruiseControl configuration is through a central XML file, usually called
config.xml,5 which, among other things, lets you define your build schedule -- probably the most important definition in continuous integration. Essentially, you must answer the question, "How long can I wait to be informed of integration errors"? There is no pre-defined schedule; you need to find the "project rhythm" that suits you best, based on how long the build takes, how often developers can deliver, and so on. Continuous integration can mean building many times a day, every twenty minutes, every hour, or even just once a day (as part of a nightly build). However, with CruiseControl, if nothing has been checked in since the last build, then the schedule will not normally force the build.
Also important is your definition of when a build becomes successful. When it compiles? When all the unit tests have run? When it has been deployed? In truth, your criteria should be a combination of these milestones; certainly the build should have compiled and the unit tests executed successfully. However, with continuous integration, you can view every failure as a success, because it will have exposed a potential problem -- and early enough to do something about it! If you do see build errors, then the best practice is to fix them right away (usually the developer who broke the build will do this). A broken build means that you cannot successfully integrate more developer changes until you fix the problems.
The quality of your unit tests also plays a critically important role in continuous integration. For every code module, you should have a set of unit tests that exercises each of its methods. If you follow the practice of test-driven development, you will actually develop your tests before you develop your code. In combination with continuous integration, this practice can be very powerful and productive. In fact, it is the mainstay of many Agile development methods.
Realizing continuous integration with IBM Rational ClearCase and IBM Rational ClearQuest
The product capabilities in the IBM Rational SCM toolset give you many different ways to realize continuous integration. For example, the Unified Change Management (UCM) capability in IBM Rational ClearCase can be configured to use different streams for both your integration and release builds. To enable this, use your default project integration stream to execute CruiseControl integration builds, and then create a child stream off the integration stream to act as your release stream (as in the
RatlBank_Rel stream in Figure 5). The release stream can then be "seeded" by rebasing against a baseline created as a result of a successful CruiseControl integration build. To isolate the integration build process, you should also create a specific Rational ClearCase view onto the integration stream, to be used solely for the purpose of executing the continuous integration build.
Figure 5: UCM stream hierarchy
It is standard practice in UCM to have a stream for each developer or group of developers. This allows developers to work in isolation and then subsequently execute a UCM rebase and delivery -- resulting in a controlled integration process. However, this process makes the following assumptions:
- That there is always an integration baseline to rebase against. This might not actually be true with continuous integration, as the builds are carried out many times a day.
- That developers are always able to successfully merge and commit their changes in the integration stream without being affected by deliveries from other developers.6 With continuous integration, there is greater potential for conflict between deliveries.
- That the build and unit tests can be run quickly and completely in the developer's environment. However, what if a complete unit-test suite takes an hour to run? Or what if some tests require that the developer deploy the application to a Web or application server environment first?
This is where the automated build, testing, and deployment capabilities of CruiseControl come in. To enable continuous integration, developers can use the capabilities of UCM to quickly rebase and deliver to the integration area, maybe carrying out some quick tests. Then, if they are using CruiseControl, they can carry on with development, knowing that their changes will be comprehensively built and tested in a scheduled build -- and that any failures will be reported immediately.
Continuous integration in practice
Continuous integration has proven value for small-to-medium-sized projects; as long as you have a comprehensive unit-test validation suite, you will be able to reap the benefits of short integration cycles. For large projects, the jury is still out, although such efforts can certainly benefit from a program-based approach with multiple continuous integration streams, one for each system or component.
Continuous integration also works best with single, mainline development projects -- those that involve development of a single latest code stream, with maintenance streams and a patch stream at the most. This is the approach used for the majority of open-source development projects. However, using continuous integration when you are developing multiple releases in parallel can be problematic. In these environments, branches or streams tend to have long lives; and if you need to integrate changes into multiple releases, this may cause more sophisticated integration problems. However, with a little extra thought and planning, it is still possible to use continuous integration, CruiseControl, and Rational ClearCase effectively.
One caution: On many continuous integration projects, developers mistakenly view change management as overhead to be avoided, because waiting for certain change requests to be assigned, analyzed, or authorized can slow down the development process. However, development teams cannot work effectively without a mechanism for tracking not only the versions in their builds but also the reasons they carried out the builds in the first place. With the flexibility of the UCM activity-based development model, you can use a combination of CruiseControl and the IBM Rational ClearCase and ClearQuest tools to automatically track these reasons -- or activities -- without significantly impacting developer productivity.
Continuous integration is a pragmatic approach that helps you heed the old adage, "Integrate early, and integrate often." When done properly, continuous integration can significantly increase developer productivity, in part by providing team members with immediate feedback via email if the build breaks. You can also use continuous integration tools for traditional development -- to implement an automated nightly build process, for example. There are really no limits to the level of automation you can achieve. Once you have an automated build and testing solution, you can automate baselining, reporting, deployment, and many other aspects of your software build and release lifecycle. However, be warned that if you are working on a crucial release build for a customer or live environment, you may want to step out of your continuous integration loop to do closer checking and achieve a finer degree of control. As I have described in this article, this is essentially the difference between an integration build and a release build.
1 See the page devoted to continuous integration on Martin Fowler's Web site: http://www.martinfowler.com/articles/continuousIntegration.html
2 Published by Addison-Wesley, 1993.
3 CruiseControl is an open source framework for a continuous build process. It includes, but is not limited to, plug-ins for email notification, Ant, and various source control tools. A Web interface is provided to view the details of current and previous builds. See http://cruisecontrol.sourceforge.net/
4 Rational ClearCase includes the build tools omake and clearmake, which provide advanced build auditing, build avoidance, and build distribution capabilities. However, in the Java domain, where Ant is the de-facto standard, they are rarely used for continuous integration.
5 See http://www.buildmeister.com/articles/cruisecontrol-overview.php for detailed information on how to create this file.
6 In recognition of this fact, there is a trigger in the ClearCase product manuals to enable serial delivery. However, this can slow down the development process and goes against the principles of continuous integration.