Agile DevOps: Version everything

Learn why all parts of a software system should use version control

Which types of software-system artifacts should you version? In this Agile DevOps installment, DevOps expert Paul Duvall recommends that DevOps teams version application code, infrastructure, configuration, data, and even internal system artifacts to gain the capacity to deliver software to users quickly and often.

Paul Duvall (paul.duvall@stelligent.com), CTO, Stelligent

Paul DuvallPaul Duvall is the CTO of Stelligent. A featured speaker at many leading software conferences, he has worked in virtually every role on software projects: developer, project manager, architect, and tester. He is the principal author of Continuous Integration: Improving Software Quality and Reducing Risk (Addison-Wesley, 2007) and a 2008 Jolt Award Winner. He is also the author of Startup@Cloud and DevOps in the Cloud LiveLessons (Pearson Education, June 2012). He's contributed to several other books as well. Paul authored the 20-article Automation for the people series on developerWorks. He is passionate about getting high-quality software to users quicker and more often through continuous delivery and the cloud. Read his blog at Stelligent.com.



27 November 2012

Also available in Russian Japanese Portuguese

About this series

Developers can learn a lot from operations, and operations can learn a lot from developers. This series of articles is dedicated to exploring the practical uses of applying an operations mindset to development, and vice versa — and of considering software products as holistic entities that can be delivered with more agility and frequency than ever before.

Version everything. Yes, everything: infrastructure, configuration, application code, and your database. If you do, you have a single source of truth that enables you to view the software system — and everything it takes to create the software — as a holistic unit. Teams that version everything aren't constantly trying to figure out which version of the application code goes with which database and which version of the software application works with which environment. Source files that make up the software system aren't on shared servers, hidden in folders on a laptop, or embedded in a nonversioned database.

If you version everything, any authorized team member should be capable of re-creating any version of the software system — the application code, configuration, infrastructure, and data — at any point in time. You should be able to create the entire software system using only nonbinary artifacts (with the exception of libraries that you do not modify) that are committed to a version-control repository (such as Subversion, Git, CVS, or Rational ClearCase, to name a few examples).

In my experience, the idea of versioning everything is simple to understand, but I rarely see it fully applied. Sure, you'll see versioning of the application code, some of the configuration and, perhaps, the data. Some teams use dependency repositories and tools (Nexus, for example) to manage libraries that they use in developing the software. Other teams use a combination of shared drives and version-control systems. However, it's less usual to see companies version all of the configuration, all of their dependent components (for example, from package repositories such as yum, apt-get, and rpm), and all of the scripts required to create the database and the data that makes up the database.

To determine whether you're versioning everything, the simple question to ask is, "Can I recreate a specific version of the complete software system — with infrastructure, data, software, and configuration — by running one command that gets a specific revision from my version-control system?" If you cannot, you're not versioning everything.

Multiple repositories

For legitimate reasons, you might need to use multiple source repositories or dependency-management repositories and maintain configuration items in a versioned database. An effective technique in this case is to apply a logical version to the multiple repositories as a way to generate the "bill of materials" you can use to identify each revision of the software system.

The key prerequisite to versioning everything is that all source artifacts must be in a scripted form. This goes for the infrastructure, the data, configuration, and the application code. The only exception is for libraries and packages — JAR files and RPM packages, for example — that you use but never modify. After all source artifacts are scripted, you can easily version them.

In this article, you'll learn how each type of software artifact can be described in code and effective approaches to using them. Several code listings in this article are examples of how each component is defined as script for execution and versioning. They are not meant to show how to write and run each type of script. Other articles in this Agile DevOps series and the Automation for the people series provide more detailed examples for the components described in this article.

Application code

Application code is probably the most obvious part of the software system that must be versioned. The code in Listing 1 is a simple Java class (called UserServiceImpl) that calls a method from an object to get some data:

Listing 1. Java application code
...
public Collection findAllStates() {
    UserDao userData = new UserDaoImpl();
    Collection states = userData.findAllStates(UserDao.ALL_STATES);
    return states;
}
...

Figure 1 illustrates committing a new application code source file — the UserServiceImpl.java in Listing 1— to the Git version-control repository hosted at GitHub:

Figure 1. Commands for committing and pushing new source code file to a Git repository
Committing new application source file by (1) marking the file for addition with git add UserServiceImpl.java; (2) committing the code using a git command and comment: git commit -m 'added user service impl class'; (3) running the git push command to push the code to the master repository.

All of the application code required to create your software application should be committed to a version-control repository. You will use the same process for any other source file — infrastructure code, data, or configuration.


Infrastructure

Because you can define infrastructure as code just as you do your application source files (see "Agile DevOps: Infrastructure automation"), you can version your infrastructure in a version-control system. These scripts might have designations such as manifests, modules, and cookbooks, but they are all text-based scripts that can be executed to create environments.

If the best practice is to define your infrastructure in code, what do people typically do? It's a mixed bag of "works of art" in which environments are manually configured each and every time, or it's a mixture of manual steps and running automated scripts. Each of these approaches results in a bottleneck, because an engineer is required to run through the steps each time. To remedy this, some will diligently describe each and every step in a set of written instructions. The problem there is that instructions might be wrong or miss some steps, or the operator running through the steps might not follow them correctly. The only solution is to fully describe your infrastructure in code that can be executed through a single command.

For example, the Puppet manifest in Listing 2 describes steps for installing a PostgreSQL database server in code. This code can be executed from the command line or through a Continuous Integration (CI) server.

Listing 2. Puppet manifest describing the installation of PostgreSQL
class postgresql {
  
  package { "postgresql8-server":
    ensure => installed,
  }
  
  exec { "initdb":
    unless => "[ -d /var/lib/postgresql/data ]",
    command => "service postgresql initdb",
    require => Package["postgresql8-server"]
  }
...

The entire manifest downloads, installs, and runs the server. By using additional manifests, you can describe your entire environment in scripts. These scripts can be checked into your version-control repository so that every revision to your infrastructure is tracked, improving change management.


Configuration

Configuration defines the information that varies across environments. Examples include directory and file locations, host names, IP addresses, and server ports, as shown in Listing 3. Scripts use this configuration when creating environments, running builds and deployments, and running tests:

Listing 3. Configuration defined in a properties file
jboss.home=/usr/local/jboss
jboss.server.hostname=jenkins.example.com
jboss.server.port=8080
jboss.server.name=default

The code in Listing 4 is a Ruby script that loads configuration items into a NoSQL database:

Listing 4. Writing dynamic configuration items to a NoSQL database
AWS::SimpleDB.consistent_reads do
  domain = sdb.domains["stacks"]
  item = domain.items["#{opts[:itemname]}"]
  
  file.each_line do|line|
    key,value = line.split '='
    item.attributes.set(
      "#{key}" => "#{value}")
  end
end

Because all of the configuration in Listing 4— such as IP addresses, domain names. and machine images — can be obtained dynamically, none of the configuration is hard-coded. You might not be able to make all your configuration dynamic, but when using the cloud, you can drastically reduce the amount of hard-coded configuration that is often the bane of most software delivery systems.


Data

Isn't scripting and versioning a lot of work?

When I described the concept of scripting database, data, and changes recently, someone responded, "That sounds like too much work!" My response was that it's much more work, and risky, to maintain a database whose state you never truly know. When a database is a "black box," the only people who can maintain it are a few DBAs on a project who rely on their memory to keep the database running — because no data changes are versioned.

The structure of a relational database can be defined in Data Definition Language (DDL) scripts. This includes the creation of the database, tables, procedures, and so on — everything except the data. You define the data in Data Manipulation Language (DML) scripts, including insert, update, and delete statements.

The partial DDL script shown in Listing 5 performs the steps for creating the database:

Listing 5. DDL for creating database tables
CREATE SEQUENCE hibernate_sequence START WITH 1 INCREMENT BY 1 NO MINVALUE \
NO MAXVALUE CACHE 1;
ALTER TABLE public.hibernate_sequence OWNER TO cd_user;

CREATE TABLE note ( id bigint NOT NULL, version bigint NOT NULL, cd_id bigint NOT NULL, \
note character varying(10000) NOT NULL, note_date_time timestamp without time zone \
NOT NULL);
ALTER TABLE public.note OWNER TO cd_user;
...

Listing 6 shows a portion of a Liquibase XML script. Liquibase is an open source domain-specific language (DSL) for database change management.

Listing 6. Liquibase script for altering a column to an existing database
<changeSet id="9" author="jayne">
  <addColumn tableName="distributor">
    <column name="phonenumber" type="varchar(255)"/>
  </addColumn> 
</changeSet>
...

You can define your database creation, data, and changes in scripts. These scripts are run as part of your build process, and they are all versioned in your version-control repository.


Build and deployment

A build compiles and packages all of the source files into a distribution. For the application code, this distribution is often a binary, such as a WAR file. A build might run infrastructure scripts to create an environment. This environment might be a virtual instance or an image that can define an instance. A build might also produce a database from the database scripts. Builds use the configuration defined in configuration files or databases. Listing 7 shows a portion of a Maven build script that defines directories and configuration for a build:

Listing 7. Partial listing of a build script in Maven
...
<build>
  <finalName>embeddedTomcatSample</finalName>
  <plugins>
      <plugin>
          <groupId>org.codehaus.mojo</groupId>
          <artifactId>appassembler-maven-plugin</artifactId>
          <version>1.1.1</version>
          <configuration>
              <assembleDirectory>target</assembleDirectory>
              <programs>
                  <program>
                      <mainClass>launch.Main</mainClass>
                      <name>webapp</name>
                  </program>
              </programs>
          </configuration>
          <executions>
              <execution>
                  <phase>package</phase>
                  <goals>
                      <goal>assemble</goal>
                  </goals>
              </execution>
          </executions>
      </plugin>
  </plugins>
</build>
...

Several vendors supply so-called "deployment automation tools." It's a bit of misnomer: These tools are likely to orchestrate the deployment, not automate it. They help you to describe the steps and order of the deployment, but the actual deployment is through a series of scripts and/or manual processes. You rarely find tools that support the versioning of the deployment artifacts and workflow. Although several of these tools provide internal versioning, this is of little use when you're looking for a single revision of your software system, unless you always use this tool — and it still segregates the versioning of deployment and other source artifacts. It doesn't need to be this way — even if you are using one of these vendor tools. An alternate approach is to describe your entire deployment in a deployment-automation DSL such as Capistrano. This way, you can version your deployment. You should be able to run the entire deployment with one command. The automated deployment should be coupled with automated tests. The orchestration tool executes the deployment script.

Capistrano is a DSL for describing deployments in multiple platforms. With Capistrano, you can define tasks such as stopping servers, copying files, and applying a workflow for deployments across multiple nodes and environment roles. Listing 8 shows a portion of a Capistrano script:

Listing 8. Partial deployment script in Capistrano
namespace :deploy do
  task :setup do
    run "sudo chown -R tomcat:tomcat #{deploy_to}"
    run "sudo service httpd stop"
    run "sudo service tomcat6 stop"
  end

...

Using a DSL is an effective way to describe your deployments. Because all the steps for deployment are scripts and are not tightly coupled to a proprietary tool, the method aligns well with teams that are implementing a continuous-delivery pipeline. After your deployments are defined as scripts, they can be versioned the same as any other component of your delivery pipeline.


Systems

There's a growing consensus among progressive teams that versioning the infrastructure, configuration, data, and application code is desirable. One additional component you can also version is internal systems — for example, the configuration for defining your CI environments. The question to ask is: What happens to your software system if the environments used for creating parts of your software-delivery system are no longer working? After you fully script the environments for your software system, you can fully script the environments for creating your software delivery system with infrastructure-automation tools. This infrastructure code and the changes to your CI configuration — such as CI job configuration — is versioned. An example of versioning Jenkins server and job configurations is shown in Listing 9:

Listing 9. Simple bash script for versioning Jenkins server configuration changes
#!/bin/bash -v

# Change into your jenkins home.
cd /usr/share/tomcat6/.jenkins

# Add any new conf files, jobs, users, and content.
git add *.xml jobs/*/config.xml plugins/*.hpi .gitignore

# Ignore things we don't care about
cat > .gitignore <<EOF
log
*.log
*.tmp
*.old
*.bak
*.jar
.*
updates/
jobs/*/builds
jobs/*/last*
jobs/*/next*
jobs/*/*.csv
jobs/*/*.txt
jobs/*/*.log
jobs/*/workspace
EOF

# Remove anything from git that no longer exists in jenkins.
git status --porcelain | grep '^ D ' | awk '{print $2;}' | xargs -r git rm

# And finally, commit and push
git commit -m 'Automated commit of jenkins configuration' -a
git push

Just as you can with any other part of your software system, you can version this script in your version-control repository.


Version this

Get involved

developerWorks Agile transformation provides news, discussions, and training to help you and your organization build a foundation on agile development principles.

In this article, you learned that everything can be and is versioned when developers and operations teams collaborate in creating a continuous-delivery platform capable of releasing software at any point in time. You saw that when all the infrastructure, data, and application resources are versioned, you can also version the components that make up the systems that perform the software delivery for your software systems. After every resource for the software systems you develop for users and the internal systems you use in getting these software systems to users is scripted, everything can be versioned.

In the next article, you'll learn about dynamic configuration management: an approach to eliminating the use of static environment-specific properties for configuration.

Resources

Learn

Get products and technologies

  • Rational ClearCase: Rational ClearCase provides sophisticated version control, workspace management, parallel development support and build auditing to improve productivity.
  • IBM Tivoli Provisioning Manager: Tivoli Provisioning Manager enables a dynamic infrastructure by automating the management of physical servers, virtual servers, software, storage, and networks.
  • IBM Tivoli System Automation for Multiplatforms: Tivoli System Automation for Multiplatforms provides high availability and automation for enterprise-wide applications and IT services.
  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.

Discuss

  • Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.
  • The developerWorks Agile transformation community provides news, discussions, and training to help you and your organization build a foundation on agile development principles.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Agile transformation, Open source, Java technology
ArticleID=847413
ArticleTitle=Agile DevOps: Version everything
publish-date=11272012