Infrastructure automation is the process of scripting environments — from installing an operating system, to installing and configuring servers on instances, to configuring how the instances and software communicate with one another, and much more. By scripting environments, you can apply the same configuration to a single node or to thousands.
Infrastructure automation also goes by other names: configuration management, IT management, provisioning, scripted infrastructures, system configuration management, and many other overlapping terms. The point is the same: you are describing your infrastructure and its configuration as a script or set of scripts so that environments can be replicated in a much less error-prone manner. Infrastructure automation brings agility to both development and operations because any authorized team member can modify the scripts while applying good development practices — such as automated testing and versioning — to your infrastructure.
In the past decade, several open source and commercial tools have emerged to support infrastructure automation. The open source tools include Bcfg2, CFEngine, Chef, and Puppet. They can be used in the cloud and in virtual and physical environments. In this article, I'll focus on the most popular open source infrastructure automation tools: Chef and Puppet. Although you won't learn the intricacies of either tool, you'll get an understanding of the similarities and differences between them, along with some representative examples. For a more detailed example of setting up and using an infrastructure automation tool, this article provides a companion video that shows how to run Puppet in a cloud environment.
Chef and Puppet both use a Ruby domain-specific language (DSL) for scripting environments. Chef is expressed as an internal Ruby DSL, and Puppet users primarily use its external DSL — also written in Ruby. These tools tend to be used more often in Linux® system automation, but they have support for Windows as well. Puppet has a larger user base than Chef, and it offers more support for older, outdated operating systems. With Puppet, you can set dependencies on other tasks. Both tools are idempotent — meaning you get the same result with the same configuration no matter how many times you run it.
Chef has been around since 2009. It was influenced by Puppet and CFEngine. Chef supports multiple platforms including Ubuntu, Debian, RHEL/CentOS, Fedora, Mac OS X, Windows 7, and Windows Server. It is often described as easier to use — particularly for Ruby developers, because everything in Chef is defined as a Ruby script and follows a model that developers are used to working in. Chef has a passionate user base, and the Chef community is rapidly growing while developing cookbooks for others to use.
In Chef, three core components interact with one another — Chef server, nodes, and Chef workstation. Chef runs cookbooks, which consist of recipes that perform automated steps — called actions — on nodes, such as installing and configuring software or adding files. The Chef server contains configuration data for managing multiple nodes. The configuration files and resources stored on the Chef server are pulled down by nodes when requested. Examples of resources include file, package, cron, and execute.
Users interact with the Chef server using Chef's command-line interface, called Knife. Nodes can have one or more roles. A role defines attributes (node-specific settings) and recipes for a node and can apply them across multiple nodes. Recipes can run other recipes. The recipes in a node, called a run list, are executed in the order they are listed. A Chef workstation is an instance with a local Chef repository and Knife installed on it.
Table 1 describes the core components of Chef:
Table 1. Chef components
| Component | Description |
|---|---|
| Attributes | Describe node data, such as the IP address and hostname. |
| Chef client | Does work on behalf of a node. A single Chef client can run recipes for multiple nodes. |
| Chef Solo | Allows you to run Chef cookbooks in the absence of a Chef server. |
| Cookbooks | Contain all the resources you need to automate your infrastructure and can be shared with other Chef users. Cookbooks typically consist of multiple recipes. |
| Data bags | Contain globally available data used by nodes and roles. |
| Knife | Used by system administrators to upload configuration changes to the Chef Server. Knife is used for communication between nodes via SSH. |
| Management console | Chef server's web interface for managing nodes, roles, cookbooks, data bags, and API clients. |
| Node | Hosts that run the Chef client. The primary features of a node, from Chef's point of view, are its attributes and its run list. Nodes are the component to which recipes and roles are applied. |
| Ohai | Detects data about your operating system. It can be used stand-alone, but its primary purpose is to provide node data to Chef. |
| Recipe | The fundamental configuration in Chef. Recipes encapsulate collections of resources that are executed in the order defined to configure the nodes. |
| Repository (Chef repository) | The place where cookbooks, roles, configuration files, and other artifacts for managing systems with Chef are hosted. |
| Resource | A cross-platform abstraction of something you're configuring on a node. For example, users and packages can be configured differently on different OS platforms; Chef abstracts the complexity in doing this away from the user. |
| Role | A mechanism for grouping similar features of similar nodes. |
| Server (Chef server) | Centralized repository of your server's configuration. |
Listing 1 demonstrates the use of the service resource within a recipe that's part of a Tomcat cookbook. You can see that you can use tools like Chef to do platform-specific configuration and manage server configuration.
Listing 1. Chef recipes
service "tomcat" do
service_name "tomcat6"
case node["platform"]
when "centos","redhat","fedora"
supports :restart => true, :status => true
when "debian","ubuntu"
supports :restart => true, :reload => true, :status => true
end
action [:enable, :start]
end
|
Listing 2 defines the attributes for the Tomcat cookbook. In this example, I'm defining some external ports for the Tomcat server to make available. Other types of attributes you might see include values for directories, options, users, and other configurations.
Listing 2. Chef attributes
default["tomcat"]["port"] = 8080 default["tomcat"]["ssl_port"] = 8443 default["tomcat"]["ajp_port"] = 8009 |
Chef extends the Ruby language — as compared to an external DSL — to provide a model for applying configuration to many nodes at once. Chef uses an imperative model without explicit dependency management, so people with more of a development background tend to gravitate toward Chef when they are scripting environments.
Puppet has been in use since 2005. Many organizations, including Google, Twitter, Oracle, and Rackspace, use it to manage their infrastructure. Puppet, which tends to require a steeper learning curve than Chef, supports a variety of Windows and *nix environments. Puppet has a large and active user community. It has been used in thousands of organizations with installations running tens of thousands of instances.
Puppet uses the concept of a master server — called the Puppet master — which centralizes the configuration among nodes and groups them together based on type. For example, if you had a set of web servers that were all running Tomcat with a Jenkins WAR, you'd group them together on the Puppet master. The Puppet agent runs as a daemon on systems. This enables you to deploy infrastructure changes to multiple nodes simultaneously. It functions the same way as a deployment manager, but instead of deploying applications, it deploys infrastructure changes.
Puppet includes a tool called facter. Facter holds metadata about the system and can be used to filter among servers. For example, you can use facter to determine a node's hostname. MCollective is a deployment tool that integrates with Puppet. You can use MCollective to deploy infrastructure changes across nodes.
Table 2 lists the key components of Puppet:
Table 2. Key Puppet components
| Component | Description |
|---|---|
| Agent | A daemon process running on a node that collects information about the node and sends it to the Puppet master. |
| Catalog | Compilation of facts that specifies how to configure the node. |
| Facts | Data about a node, sent by the node to the Puppet master. |
| Manifest | Describes resources and the dependencies among them. |
| Module | Groups related manifests (in a directory). For example, a module might define how a database like MySQL gets installed, configured, and run. |
| Node | A host that is managed by the Puppet master. Nodes are defined like classes but contain the host name or fully qualified domain name. |
| Puppet master | The server that manages all the Puppet nodes. |
| Resource | For example, a package, file, or service. |
In the example in Listing 3, a Puppet manifest describes the packages to install on a node. Puppet determines the best approach and order of execution for installing these packages.
Listing 3. Puppet manifest for package installation
class system {
package { "rubygems": ensure => "installed" }
package { "make": ensure => "installed" }
package { "gcc": ensure => "installed" }
package { "gcc-c++": ensure => "installed" }
package { "ruby-devel": ensure => "installed" }
package { "libcurl-devel": ensure => "installed" }
package { "zlib-devel": ensure => "installed" }
package { "openssl-devel": ensure => "installed" }
package { "libxml2-devel": ensure => "installed" }
package { "libxslt-devel": ensure => "installed" }
}
|
The Puppet manifest snippet in Listing 4 shows examples of different resource types — package and service — that can be used in scripting an infrastructure:
Listing 4. Puppet manifest for
httpd
class httpd {
package { 'httpd-devel':
ensure => installed,
}
service { 'httpd':
ensure => running,
enable => true,
subscribe => Package['httpd-devel'],
}
}
|
Puppet employs a declarative model with explicit dependency management. Because of this, it tends to be one of the first tool considerations by engineers who have more of a systems administration background and are looking to script their environments.
In this article, you learned — through examples — that your infrastructure no longer needs to be a manual effort uniquely applied to individual nodes. By automating your infrastructure, you can scale it up and down without any additional effort. Because your infrastructure is modeled in scripts, you can version and test them just like the application code.
In the next article, you'll learn patterns and techniques for creating ephemeral (or transient) environments — environments that are created and destroyed in 24 hours and embrace the abundance mindset (that is, lack of scarcity) that comes with Agile DevOps.
Learn
-
"The Chef, the Puppet, and the Sexy IT Admin" (Cade Metz, Wired, October 2011): Read more about how Chef and Puppet and relate to DevOps.
- DevOps: Wikipedia describes the methodologies and motivation behind the DevOps movement.
- Chef Basics: An introduction to Chef from its developers.
-
"Automating Web App Deployments with Opscode Chef and iControl": The networking company F5 describes how it's using Chef.
-
A survey of system configuration tools (Thomas Delaet, Wouter Joosen, and Bart Vanbrabant, Proceedings of the 24th Large System Administration Conference, 2010): This paper and presentation present a framework for evaluating 11 open source and commercial system configuration tools, including Chef and Puppet.
- Stay current with developerWorks technical events and webcasts focused on a variety of IBM products and IT industry topics.
- Attend a free developerWorks Live! briefing to get up-to-speed quickly on IBM products and tools as well as IT industry trends.
- Follow developerWorks on Twitter.
- Watch developerWorks on-demand demos ranging from product installation and setup demos for beginners, to advanced functionality for experienced developers.
Get products and technologies
-
Chef: Several Chef "flavors" are available.
-
Puppet: Download Puppet Enterprise.
-
IBM Tivoli Provisioning Manager: Tivoli Provisioning Manager enables a dynamic infrastructure by automating the management of physical servers, virtual servers, software, storage, and networks.
-
IBM Tivoli® System Automation for Multiplatforms: Tivoli System Automation for Multiplatforms provides high availability and automation for enterprise-wide applications and IT services.
-
Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.
Discuss
- Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.
- The developerWorks Agile transformation community provides news, discussions, and training to help you and your organization build a foundation on agile development principles.

Paul Duvall is the CTO of Stelligent. A featured speaker at many leading software conferences, he has worked in virtually every role on software projects: developer, project manager, architect, and tester. He is the principal author of Continuous Integration: Improving Software Quality and Reducing Risk (Addison-Wesley, 2007) and a 2008 Jolt Award Winner. He is also the author of Startup@Cloud and DevOps in the Cloud LiveLessons (Pearson Education, June 2012). He's contributed to several other books as well. Paul authored the 20-article Automation for the people series on developerWorks. He is passionate about getting high-quality software to users quicker and more often through continuous delivery and the cloud. Read his blog at Stelligent.com.




