Agile DevOps

Infrastructure automation

Treat infrastructure as code with Chef or Puppet


Content series:

This content is part # of # in the series: Agile DevOps

Stay tuned for additional content in this series.

This content is part of the series:Agile DevOps

Stay tuned for additional content in this series.

Infrastructure automation is the process of scripting environments — from installing an operating system, to installing and configuring servers on instances, to configuring how the instances and software communicate with one another, and much more. By scripting environments, you can apply the same configuration to a single node or to thousands.

Infrastructure automation also goes by other names: configuration management, IT management, provisioning, scripted infrastructures, system configuration management, and many other overlapping terms. The point is the same: you are describing your infrastructure and its configuration as a script or set of scripts so that environments can be replicated in a much less error-prone manner. Infrastructure automation brings agility to both development and operations because any authorized team member can modify the scripts while applying good development practices — such as automated testing and versioning — to your infrastructure.

In the past decade, several open source and commercial tools have emerged to support infrastructure automation. The open source tools include Bcfg2, CFEngine, Chef, and Puppet. They can be used in the cloud and in virtual and physical environments. In this article, I'll focus on the most popular open source infrastructure automation tools: Chef and Puppet. Although you won't learn the intricacies of either tool, you'll get an understanding of the similarities and differences between them, along with some representative examples. For a more detailed example of setting up and using an infrastructure automation tool, this article provides a companion video that shows how to run Puppet in a cloud environment.

Chef and Puppet both use a Ruby domain-specific language (DSL) for scripting environments. Chef is expressed as an internal Ruby DSL, and Puppet users primarily use its external DSL — also written in Ruby. These tools tend to be used more often in Linux® system automation, but they have support for Windows as well. Puppet has a larger user base than Chef, and it offers more support for older, outdated operating systems. With Puppet, you can set dependencies on other tasks. Both tools are idempotent— meaning you get the same result with the same configuration no matter how many times you run it.


Chef has been around since 2009. It was influenced by Puppet and CFEngine. Chef supports multiple platforms including Ubuntu, Debian, RHEL/CentOS, Fedora, Mac OS X, Windows 7, and Windows Server. It is often described as easier to use — particularly for Ruby developers, because everything in Chef is defined as a Ruby script and follows a model that developers are used to working in. Chef has a passionate user base, and the Chef community is rapidly growing while developing cookbooks for others to use.

How it works

In Chef, three core components interact with one another — Chef server, nodes, and Chef workstation. Chef runs cookbooks, which consist of recipes that perform automated steps — called actions — on nodes, such as installing and configuring software or adding files. The Chef server contains configuration data for managing multiple nodes. The configuration files and resources stored on the Chef server are pulled down by nodes when requested. Examples of resources include file, package, cron, and execute.

Users interact with the Chef server using Chef's command-line interface, called Knife. Nodes can have one or more roles. A role defines attributes (node-specific settings) and recipes for a node and can apply them across multiple nodes. Recipes can run other recipes. The recipes in a node, called a run list, are executed in the order they are listed. A Chef workstation is an instance with a local Chef repository and Knife installed on it.

Table 1 describes the core components of Chef:

Table 1. Chef components
AttributesDescribe node data, such as the IP address and hostname.
Chef clientDoes work on behalf of a node. A single Chef client can run recipes for multiple nodes.
Chef SoloAllows you to run Chef cookbooks in the absence of a Chef server.
CookbooksContain all the resources you need to automate your infrastructure and can be shared with other Chef users. Cookbooks typically consist of multiple recipes.
Data bagsContain globally available data used by nodes and roles.
KnifeUsed by system administrators to upload configuration changes to the Chef Server. Knife is used for communication between nodes via SSH.
Management consoleChef server's web interface for managing nodes, roles, cookbooks, data bags, and API clients.
NodeHosts that run the Chef client. The primary features of a node, from Chef's point of view, are its attributes and its run list. Nodes are the component to which recipes and roles are applied.
OhaiDetects data about your operating system. It can be used stand-alone, but its primary purpose is to provide node data to Chef.
RecipeThe fundamental configuration in Chef. Recipes encapsulate collections of resources that are executed in the order defined to configure the nodes.
Repository (Chef repository)The place where cookbooks, roles, configuration files, and other artifacts for managing systems with Chef are hosted.
ResourceA cross-platform abstraction of something you're configuring on a node. For example, users and packages can be configured differently on different OS platforms; Chef abstracts the complexity in doing this away from the user.
RoleA mechanism for grouping similar features of similar nodes.
Server (Chef server)Centralized repository of your server's configuration.


Listing 1 demonstrates the use of the service resource within a recipe that's part of a Tomcat cookbook. You can see that you can use tools like Chef to do platform-specific configuration and manage server configuration.

Listing 1. Chef recipes
service "tomcat" do
  service_name "tomcat6"
  case node["platform"]
  when "centos","redhat","fedora"
    supports :restart => true, :status => true
  when "debian","ubuntu"
    supports :restart => true, :reload => true, :status => true
  action [:enable, :start]

Listing 2 defines the attributes for the Tomcat cookbook. In this example, I'm defining some external ports for the Tomcat server to make available. Other types of attributes you might see include values for directories, options, users, and other configurations.

Listing 2. Chef attributes
default["tomcat"]["port"] = 8080
default["tomcat"]["ssl_port"] = 8443
default["tomcat"]["ajp_port"] = 8009

Chef extends the Ruby language — as compared to an external DSL — to provide a model for applying configuration to many nodes at once. Chef uses an imperative model without explicit dependency management, so people with more of a development background tend to gravitate toward Chef when they are scripting environments.


Puppet has been in use since 2005. Many organizations, including Google, Twitter, Oracle, and Rackspace, use it to manage their infrastructure. Puppet, which tends to require a steeper learning curve than Chef, supports a variety of Windows and *nix environments. Puppet has a large and active user community. It has been used in thousands of organizations with installations running tens of thousands of instances.

How it works

Puppet uses the concept of a master server — called the Puppet master — which centralizes the configuration among nodes and groups them together based on type. For example, if you had a set of web servers that were all running Tomcat with a Jenkins WAR, you'd group them together on the Puppet master. The Puppet agent runs as a daemon on systems. This enables you to deploy infrastructure changes to multiple nodes simultaneously. It functions the same way as a deployment manager, but instead of deploying applications, it deploys infrastructure changes.

Puppet includes a tool called facter. Facter holds metadata about the system and can be used to filter among servers. For example, you can use facter to determine a node's hostname. MCollective is a deployment tool that integrates with Puppet. You can use MCollective to deploy infrastructure changes across nodes.

Table 2 lists the key components of Puppet:

Table 2. Key Puppet components
AgentA daemon process running on a node that collects information about the node and sends it to the Puppet master.
CatalogCompilation of facts that specifies how to configure the node.
FactsData about a node, sent by the node to the Puppet master.
ManifestDescribes resources and the dependencies among them.
ModuleGroups related manifests (in a directory). For example, a module might define how a database like MySQL gets installed, configured, and run.
NodeA host that is managed by the Puppet master. Nodes are defined like classes but contain the host name or fully qualified domain name.
Puppet masterThe server that manages all the Puppet nodes.
ResourceFor example, a package, file, or service.


In the example in Listing 3, a Puppet manifest describes the packages to install on a node. Puppet determines the best approach and order of execution for installing these packages.

Listing 3. Puppet manifest for package installation
class system {
  package { "rubygems": ensure => "installed" }

  package { "make": ensure => "installed" }
  package { "gcc": ensure => "installed" }
  package { "gcc-c++": ensure => "installed" }
  package { "ruby-devel": ensure => "installed" }
  package { "libcurl-devel": ensure => "installed" }
  package { "zlib-devel": ensure => "installed" }
  package { "openssl-devel": ensure => "installed" }
  package { "libxml2-devel": ensure => "installed" }
  package { "libxslt-devel": ensure => "installed" }

The Puppet manifest snippet in Listing 4 shows examples of different resource types — package and service — that can be used in scripting an infrastructure:

Listing 4. Puppet manifest for httpd
class httpd {
  package { 'httpd-devel':
    ensure => installed,
  service { 'httpd':
    ensure => running,
    enable => true,
    subscribe => Package['httpd-devel'],

Puppet employs a declarative model with explicit dependency management. Because of this, it tends to be one of the first tool considerations by engineers who have more of a systems administration background and are looking to script their environments.

Infrastructure as code

In this article, you learned — through examples — that your infrastructure no longer needs to be a manual effort uniquely applied to individual nodes. By automating your infrastructure, you can scale it up and down without any additional effort. Because your infrastructure is modeled in scripts, you can version and test them just like the application code.

In the next article, you'll learn patterns and techniques for creating ephemeral (or transient) environments — environments that are created and destroyed in 24 hours and embrace the abundance mindset (that is, lack of scarcity) that comes with Agile DevOps.

Downloadable resources

Related topics

Zone=DevOps, Java development
ArticleTitle=Agile DevOps: Infrastructure automation