Version everything

Learn why all parts of a software system should use version control


Content series:

This content is part # of # in the series: Agile DevOps

Stay tuned for additional content in this series.

This content is part of the series:Agile DevOps

Stay tuned for additional content in this series.

Version everything. Yes, everything: infrastructure, configuration, application code, and your database. If you do, you have a single source of truth that enables you to view the software system — and everything it takes to create the software — as a holistic unit. Teams that version everything aren't constantly trying to figure out which version of the application code goes with which database and which version of the software application works with which environment. Source files that make up the software system aren't on shared servers, hidden in folders on a laptop, or embedded in a nonversioned database.

If you version everything, any authorized team member should be capable of re-creating any version of the software system — the application code, configuration, infrastructure, and data — at any point in time. You should be able to create the entire software system using only nonbinary artifacts (with the exception of libraries that you do not modify) that are committed to a version-control repository (such as Subversion, Git, CVS, or Rational ClearCase, to name a few examples).

In my experience, the idea of versioning everything is simple to understand, but I rarely see it fully applied. Sure, you'll see versioning of the application code, some of the configuration and, perhaps, the data. Some teams use dependency repositories and tools (Nexus, for example) to manage libraries that they use in developing the software. Other teams use a combination of shared drives and version-control systems. However, it's less usual to see companies version all of the configuration, all of their dependent components (for example, from package repositories such as yum, apt-get, and rpm), and all of the scripts required to create the database and the data that makes up the database.

To determine whether you're versioning everything, the simple question to ask is, "Can I recreate a specific version of the complete software system — with infrastructure, data, software, and configuration — by running one command that gets a specific revision from my version-control system?" If you cannot, you're not versioning everything.

The key prerequisite to versioning everything is that all source artifacts must be in a scripted form. This goes for the infrastructure, the data, configuration, and the application code. The only exception is for libraries and packages — JAR files and RPM packages, for example — that you use but never modify. After all source artifacts are scripted, you can easily version them.

In this article, you'll learn how each type of software artifact can be described in code and effective approaches to using them. Several code listings in this article are examples of how each component is defined as script for execution and versioning. They are not meant to show how to write and run each type of script. Other articles in this Agile DevOps series and the Automation for the people series provide more detailed examples for the components described in this article.

Application code

Application code is probably the most obvious part of the software system that must be versioned. The code in Listing 1 is a simple Java class (called UserServiceImpl) that calls a method from an object to get some data:

Listing 1. Java application code
public Collection findAllStates() {
    UserDao userData = new UserDaoImpl();
    Collection states = userData.findAllStates(UserDao.ALL_STATES);
    return states;

Figure 1 illustrates committing a new application code source file — the in Listing 1— to the Git version-control repository hosted at GitHub:

Figure 1. Commands for committing and pushing new source code file to a Git repository
Committing new application source file by (1) marking the file for addition with git add; (2) committing the code using a git command and comment: git commit -m 'added user service impl class'; (3) running the git push command to push the code to the master repository.
Committing new application source file by (1) marking the file for addition with git add; (2) committing the code using a git command and comment: git commit -m 'added user service impl class'; (3) running the git push command to push the code to the master repository.

All of the application code required to create your software application should be committed to a version-control repository. You will use the same process for any other source file — infrastructure code, data, or configuration.


Because you can define infrastructure as code just as you do your application source files (see "Agile DevOps: Infrastructure automation"), you can version your infrastructure in a version-control system. These scripts might have designations such as manifests, modules, and cookbooks, but they are all text-based scripts that can be executed to create environments.

If the best practice is to define your infrastructure in code, what do people typically do? It's a mixed bag of "works of art" in which environments are manually configured each and every time, or it's a mixture of manual steps and running automated scripts. Each of these approaches results in a bottleneck, because an engineer is required to run through the steps each time. To remedy this, some will diligently describe each and every step in a set of written instructions. The problem there is that instructions might be wrong or miss some steps, or the operator running through the steps might not follow them correctly. The only solution is to fully describe your infrastructure in code that can be executed through a single command.

For example, the Puppet manifest in Listing 2 describes steps for installing a PostgreSQL database server in code. This code can be executed from the command line or through a Continuous Integration (CI) server.

Listing 2. Puppet manifest describing the installation of PostgreSQL
class postgresql {
  package { "postgresql8-server":
    ensure => installed,
  exec { "initdb":
    unless => "[ -d /var/lib/postgresql/data ]",
    command => "service postgresql initdb",
    require => Package["postgresql8-server"]

The entire manifest downloads, installs, and runs the server. By using additional manifests, you can describe your entire environment in scripts. These scripts can be checked into your version-control repository so that every revision to your infrastructure is tracked, improving change management.


Configuration defines the information that varies across environments. Examples include directory and file locations, host names, IP addresses, and server ports, as shown in Listing 3. Scripts use this configuration when creating environments, running builds and deployments, and running tests:

Listing 3. Configuration defined in a properties file

The code in Listing 4 is a Ruby script that loads configuration items into a NoSQL database:

Listing 4. Writing dynamic configuration items to a NoSQL database
AWS::SimpleDB.consistent_reads do
  domain =["stacks"]
  item = domain.items["#{opts[:itemname]}"]
  file.each_line do|line|
    key,value = line.split '='
      "#{key}" => "#{value}")

Because all of the configuration in Listing 4— such as IP addresses, domain names. and machine images — can be obtained dynamically, none of the configuration is hard-coded. You might not be able to make all your configuration dynamic, but when using the cloud, you can drastically reduce the amount of hard-coded configuration that is often the bane of most software delivery systems.


The structure of a relational database can be defined in Data Definition Language (DDL) scripts. This includes the creation of the database, tables, procedures, and so on — everything except the data. You define the data in Data Manipulation Language (DML) scripts, including insert, update, and delete statements.

The partial DDL script shown in Listing 5 performs the steps for creating the database:

Listing 5. DDL for creating database tables
ALTER TABLE public.hibernate_sequence OWNER TO cd_user;

CREATE TABLE note ( id bigint NOT NULL, version bigint NOT NULL, cd_id bigint NOT NULL, \
note character varying(10000) NOT NULL, note_date_time timestamp without time zone \
ALTER TABLE public.note OWNER TO cd_user;

Listing 6 shows a portion of a Liquibase XML script. Liquibase is an open source domain-specific language (DSL) for database change management.

Listing 6. Liquibase script for altering a column to an existing database
<changeSet id="9" author="jayne">
  <addColumn tableName="distributor">
    <column name="phonenumber" type="varchar(255)"/>

You can define your database creation, data, and changes in scripts. These scripts are run as part of your build process, and they are all versioned in your version-control repository.

Build and deployment

A build compiles and packages all of the source files into a distribution. For the application code, this distribution is often a binary, such as a WAR file. A build might run infrastructure scripts to create an environment. This environment might be a virtual instance or an image that can define an instance. A build might also produce a database from the database scripts. Builds use the configuration defined in configuration files or databases. Listing 7 shows a portion of a Maven build script that defines directories and configuration for a build:

Listing 7. Partial listing of a build script in Maven

Several vendors supply so-called "deployment automation tools." It's a bit of misnomer: These tools are likely to orchestrate the deployment, not automate it. They help you to describe the steps and order of the deployment, but the actual deployment is through a series of scripts and/or manual processes. You rarely find tools that support the versioning of the deployment artifacts and workflow. Although several of these tools provide internal versioning, this is of little use when you're looking for a single revision of your software system, unless you always use this tool — and it still segregates the versioning of deployment and other source artifacts. It doesn't need to be this way — even if you are using one of these vendor tools. An alternate approach is to describe your entire deployment in a deployment-automation DSL such as Capistrano. This way, you can version your deployment. You should be able to run the entire deployment with one command. The automated deployment should be coupled with automated tests. The orchestration tool executes the deployment script.

Capistrano is a DSL for describing deployments in multiple platforms. With Capistrano, you can define tasks such as stopping servers, copying files, and applying a workflow for deployments across multiple nodes and environment roles. Listing 8 shows a portion of a Capistrano script:

Listing 8. Partial deployment script in Capistrano
namespace :deploy do
  task :setup do
    run "sudo chown -R tomcat:tomcat #{deploy_to}"
    run "sudo service httpd stop"
    run "sudo service tomcat6 stop"


Using a DSL is an effective way to describe your deployments. Because all the steps for deployment are scripts and are not tightly coupled to a proprietary tool, the method aligns well with teams that are implementing a continuous-delivery pipeline. After your deployments are defined as scripts, they can be versioned the same as any other component of your delivery pipeline.


There's a growing consensus among progressive teams that versioning the infrastructure, configuration, data, and application code is desirable. One additional component you can also version is internal systems — for example, the configuration for defining your CI environments. The question to ask is: What happens to your software system if the environments used for creating parts of your software-delivery system are no longer working? After you fully script the environments for your software system, you can fully script the environments for creating your software delivery system with infrastructure-automation tools. This infrastructure code and the changes to your CI configuration — such as CI job configuration — is versioned. An example of versioning Jenkins server and job configurations is shown in Listing 9:

Listing 9. Simple bash script for versioning Jenkins server configuration changes
#!/bin/bash -v

# Change into your jenkins home.
cd /usr/share/tomcat6/.jenkins

# Add any new conf files, jobs, users, and content.
git add *.xml jobs/*/config.xml plugins/*.hpi .gitignore

# Ignore things we don't care about
cat > .gitignore <<EOF

# Remove anything from git that no longer exists in jenkins.
git status --porcelain | grep '^ D ' | awk '{print $2;}' | xargs -r git rm

# And finally, commit and push
git commit -m 'Automated commit of jenkins configuration' -a
git push

Just as you can with any other part of your software system, you can version this script in your version-control repository.

Version this

In this article, you learned that everything can be and is versioned when developers and operations teams collaborate in creating a continuous-delivery platform capable of releasing software at any point in time. You saw that when all the infrastructure, data, and application resources are versioned, you can also version the components that make up the systems that perform the software delivery for your software systems. After every resource for the software systems you develop for users and the internal systems you use in getting these software systems to users is scripted, everything can be versioned.

In the next article, you'll learn about dynamic configuration management: an approach to eliminating the use of static environment-specific properties for configuration.

Downloadable resources

Related topics


Sign in or register to add and subscribe to comments.

Zone=DevOps, Open source, Java development
ArticleTitle=Agile DevOps: Version everything