You migrated the SmallPayroll.ca application to the Amazon cloud in part 1 of this series and made it more robust in part 2. The application can even add and remove servers on its own, depending on load, as you saw in Part 3. It's now likely that at any given time, the numbers and IP addresses of the active servers cannot be predicted, which makes connecting to them a challenge. As a result, the cloud environment is different from a traditional data center.
The dynamic nature of the cloud also makes application deployment difficult. Your list of servers will be different between deployments, so how do you update the application? For that matter, how do you monitor your servers for faults?
This isn't your normal data center
In a "normal" data center, you can name your computers whatever you want, give them IP addresses that suit you, and—if you want—go and look at your servers to make sure they're still there. Maybe you keep a spreadsheet to track your servers, maybe you have software, and maybe you just keep information in your head or in a text file. Do you have configuration management to make sure your configuration is consistent?
The cloud environment is much different from a traditional data center, because you're ceding control of many functions. You can't predict IP addresses or even ensure that two servers will be on the same subnet. If you progress to automatic scaling of resources, all the hard work you did in manual configurations might be lost when a new node is launched. Your scripts that rely on knowing you have 20 Web servers with a predictable name won't work in the cloud.
Fortunately, a bit of discipline can work around these problems and even improve your uptime in the physical data center!
People tend to spend a great deal of time worrying about what to name their servers and how to come up with a sensible IP addressing scheme. Amazon Elastic Compute Cloud (Amazon EC2) instances come up with a fairly random IP address and a name based on this address. You could certainly rename your server, but that often requires knowledge of the rest of the environment. For example, to call a server webprd42, you have to know that the last server you launched was webprd41.
The better solution is not to rely on names or IP addresses and to build your software such that these names don't matter.
In a physical environment, you can usually get away with making manual configuration changes to servers. When servers are launched automatically, manual changes won't be applied. You can re-bundle your Amazon Machine Image (AMI) after each change, but doing so doesn't solve the problem of how to push updates to the other servers that are already running. Fortunately, excellent software packages, such as Puppet and Cfengine, can automate these changes for you (see Resources).
Deploying application changes is another aspect of configuration management that deserves a separate look. Generic configuration management tools can do the job, but using them to reproduce the specific steps in deploying an application and managing migrations and configuration rollbacks is difficult. The Rails community has come out with other tools, such as Capistrano, to handle the task of application deployment (see Resources).
It is helpful to look at configuration management as two separate problems. The first is how to manage the server—from the installation of software packages to the configuration of various daemons. The second is how to deploy new versions of software in a controlled manner.
It's important to know what your servers are doing. CPU, disk resources, memory, and the network are vital components you need to monitor. Daemons running on your system, including the application itself, may have other metrics to watch. For example, watching application response time and the number of connections to the Web server and application server can warn you of problems before they happen.
Many tools are available to monitor servers and graph the results. The challenge is how to monitor new servers as they come online and how to stop monitoring them as they are taken offline.
Patterns as applied to cloud architecture
Three general patterns emerge when you look at how to manage a dynamic environment such as Amazon EC2:
- Client poll. The server queries a central server for resources. You don't need to know the addresses of all your servers using this pattern, but the servers operate on their own schedule, so you can't control the timing of the client's polling.
- Server push. This pattern first queries the cloud provider's application programming interface (API) to find the current list of servers, then a central server contacts each server to do the work. This pattern is slower and requires that the management tool understand the dynamic nature of the environment, but it has the benefit of allowing you to synchronize updates.
- Client registration. As each server comes online, it registers itself with a central server. Before the server is terminated, it de-registers itself. This method is more complex but lets you use non-cloud-aware tools in a cloud environment.
Client polling for configuration management
This pattern is easy to implement: A client simply polls a well-known server for instructions on a predetermined schedule. If the server doesn't have anything for the client to do, it informs the client of such. The downside is that instructions can only be issued if the client polls the server; if the change is urgent, it must wait for the next poll.
An excellent use for polling is configuration management of the server. The Puppet package from Reductive Labs is a popular configuration management tool. A process, called the Puppetmaster, runs on a central server. Clients run the Puppet daemon, which polls the Puppetmaster for the appropriate configuration manifest. These configuration manifests specify the desired end state of a particular component, such as "make sure that the NTP daemon is installed and running." Puppet reads these manifests and corrects any problems.
Your distribution may come with Puppet, or you can quickly install it with
gem install puppet facter. Puppet implements a security
system that complicates matters, however. Clients must have a signed key to talk to
the Puppetmaster. You can tell the Puppetmaster to automatically sign keys for
clients that connect, but doing so would allow anyone to download your configuration
files. An alternative solution is to ignore the Puppetmaster, distribute your manifests
yourself, and run the Puppet tools locally.
The sequence of events to have the client run the Puppet manifests is as follows:
- Download an updated copy of the manifests and any associated files from the server.
- Run Puppet against the manifest.
For step 1, the tool of choice is rsync, which only downloads
changed files. For step 2, the puppet command (part of
the puppet installation) executes the manifest. Note that there are two caveats to
this approach:
- The server must accept the client's Secure Shell (SSH) public key. This key can be distributed in the AMI.
- Any configuration files you specify in the manifest must be copied with the manifest. The built-in Puppet file server also requires certificates, so you can't use this file transfer method.
The sample manifest ensures that the client has the correct network time protocol configuration. This involves making sure that the software is installed, the configuration file is modified, and the daemon is running. Listing 1 shows the top-level manifest.
Listing 1. The top-level manifest
import "classes/*"
node default {
include ntpclient
}
|
Listing 1 first imports all the files in the classes directory; each
file contains information about a single component. All nodes then include the
ntpclient class, which is defined in Listing 2.
Listing 2. The ntpclient class
class ntpclient {
package {
ntp:
ensure => installed
}
service {
ntpd:
ensure => true,
enable => true,
subscribe => [ Package [ntp], File["ntp.conf"] ],
}
file {
"ntp.conf":
mode => 644,
owner => root,
group => root,
path => "/etc/ntp.conf",
source => "/var/puppetstage/files/etc/ntp.conf",
before => Service["ntpd"]
}
}
|
A detailed look at the Puppet language is outside the scope of this article, but at
a high level, Listing 2 defines a class called ntpclient
that is composed of a package called ntp, a service
called ntpd, and a file in /etc called ntp.conf.
If the ntp package is not installed, Puppet uses the
appropriate tool, such as yum or apt-get
to install it. If the service is not running and in the startup scripts, it will be
fixed. If the ntp.conf file differs from the copy in /var/puppetstage/files/etc,
the file will be updated. The before and
subscribe lines make sure that the daemon gets
restarted if the configuration changes.
The server stores the manifests and files in /var/puppetdist, and clients copy that tree to /var/puppetstage. The outline of the directory tree is shown in Listing 3.
Listing 3. Contents of /var/puppetdist
/var/puppetdist/
|-- files
| `-- etc
| `-- ntp.conf
`-- manifests
|-- classes
| `-- ntp.conf
`-- site.pp
|
Finally, Listing 4 synchronizes the files and runs the manifest on the client.
Listing 4. Client code to synchronize and run the manifest
#!/bin/bash /usr/bin/rsync -avz puppetserver:/var/puppetdist/ /var/puppetstage/ --delete /usr/bin/puppet /var/puppetstage/manifests/site.pp |
This code, when run from cron periodically, picks
up any changes in the manifests and applies them to the cloud server. If
the server's configuration somehow gets changed, Puppet takes steps to put
the server back into compliance.
Configuration updates on servers rarely require synchronization between servers. If a package needs to be upgraded, a half-hour window is usually enough. For application updates, however, you want to roll out your changes at once, and you want control over the timing. A popular tool for accomplishing this is Capistrano. You write a script that uses Capistrano's domain-specific language and run various tasks. Listing 5 shows a minimal Capistrano script to push an application to a known set of servers.
Listing 5. A simple Capistrano script
set :application, "payroll"
set :repository, "https://svn.smallpayroll.ca/svn/app/trunk/"
set :user, 'payroll'
set :home, '/home/payroll'
set :deploy_to, "#{home}"
set :rails_env, "production"
role :db, "174.129.174.213", :primary => true
role :web, "174.129.174.213", "184.73.3.169"
|
Most of the lines in Listing 5 set variables that alter the default behavior of Capistrano, which is to use SSH to access all the servers and use a source code management tool to check out a copy of the application. The last two lines define the servers in use—in particular, the database and Web servers. These roles are known to Capistrano (and can be extended for your own purposes).
The problem with Listing 5 is that the servers must be predefined. It is possible to have Capistrano determine the list of servers at run time using the Amazon Web Services (AWS) APIs, however. First, run:
gem install amazon-ec2 |
to install a library that implements the API. Then, modify your Capistrano recipe (deploy.rb) as shown in Listing 6.
Listing 6. Modifying Capistrano to dynamically load the list of servers at run time
# Put this at the beginning of your deploy.rb
require 'AWS'
# Change your role :web definition to this
role(:web) { my_instances }
# This goes at the bottom of the recipe
def my_instances
@ec2 = AWS::EC2::Base.new( :access_key_id => ENV['AWS_ACCESS_KEY_ID'],
:secret_access_key => ENV['AWS_SECRET_ACCESS_KEY'])
servers = @ec2.describe_instances.reservationSet.item.collect do |itemgroup|
itemgroup.instancesSet.item.collect {|item| item.ipAddress}
end
servers.flatten
end
|
Listing 6 changes the Web role from a static definition to
a dynamic list of servers returned from the my_instances
function. The function uses the Amazon EC2 API DescribeInstances
call to return a list of servers. The API returns data in a format that groups
instances that were launched together under the same reservation identifier.
The outer collect loop iterates over these reservation
groups, and the inner collect loop iterates over the
servers contained within each restrain group. The result is an array of arrays,
which is flattened to a single dimensional array of server IP addresses and passed
back to the caller.
It is fortunate that Capistrano has provided a way to operate on a dynamic list of servers. If it did not provide such hooks, then you would need to take another approach.
Registering with a management server
For applications that don't easily allow you to use a dynamic list of servers, you can work around the problem by having the cloud server register itself with other applications. This process generally takes one of two forms:
- The cloud server connects to another server and runs a script, which updates the management application directly.
- The cloud server drops a file with some metadata in a common place, such as Amazon Simple Storage Service (Amazon S3), where other scripts look to rebuild their configuration files.
Cacti is a popular performance management tool that can graph various metrics through Simple Network Management Protocol (SNMP) or scripts and combine these graphs into dashboards or meta-graphs (see Resources). The limitation with Cacti is that you have to configure the server for management within the Cacti Web interface or through command-line scripts. In this example, the cloud server connects back to the Cacti server and configure itself.
Cacti is based on a system of templates, which makes mass changes to graphs much easier. All the command-line tools operate on the template identifier, though, so you must first figure out which identifiers to use. Listing 7 shows how to find the host template, which pre-populates some data elements for you.
Listing 7. Listing the host templates
$ php -q /var/lib/cacti/cli/add_device.php --list-host-templates Valid Host Templates: (id, name) 0 None 1 Generic SNMP-enabled Host 3 ucd/net SNMP Host 4 Karlnet Wireless Bridge 5 Cisco Router 6 Netware 4/5 Server 7 Windows 2000/XP Host 8 Local Linux Machine |
Template number 3 is for a host running the Net-SNMP
daemon, which is available with most Linux® distributions out there.
Using this specific daemon rather than a more generic version allows you to
monitor some Linux-specific counters easily.
Knowing that you are using host template 3, the list of available graphs is shown in Listing 8.
Listing 8. Listing the graph templates
$ php -q /var/lib/cacti/cli/add_graphs.php --list-graph-templates --host-template-id=3 Known Graph Templates:(id, name) 4 ucd/net - CPU Usage 11 ucd/net - Load Average 13 ucd/net - Memory Usage |
The three graphs in Listing 8 are what you get with the
default Cacti distribution. You can add many more, you can leave off the
--host-template-id option to see them, or
import the graphs from sources on the Internet.
Listing 9 shows how to add a new device, and then a CPU graph.
Listing 9. Adding a new device with a graph
$ php -q /var/lib/cacti/cli/add_device.php --description="EC2-1.2.3.4" \ --ip=1.2.3.4 --template=3 Adding EC2-1.2.3.4 (1.2.3.4) as "ucd/net SNMP Host" using SNMP v1 with community "public" Success - new device-id: (5) php -q /var/lib/cacti/cli/add_graphs.php --host-id=5 --graph-type=cg \ --graph-template-id=4 Graph Added - graph-id: (6) - data-source-ids: (11, 12, 13) |
Listing 9 first adds a host with the IP address 1.2.3.4. The
device ID returned is 5, which is then used to
add a graph for CPU usage (graph type of cg and
template 4). The results are the ID of the graph and the IDs of the various
data sources that are now being monitored.
It is now fairly easy to script the procedure in Listing 9. Listing 10 shows such a script.
Listing 10. add_to_cacti.sh
#!/bin/bash
IP=$1
# Add a new device and parse the output to only return the id
DEVICEID=`php -q /var/lib/cacti/cli/add_device.php --description="EC2-$IP" \
--ip=$IP --template=3 | grep device-id | sed 's/[^0-9]//g'`
# CPU graph
php -q /var/lib/cacti/cli/add_graphs.php --host-id=$DEVICEID --graph-type=cg \
--graph-template-id=4
|
The first parameter to the script is saved to a variable called $IP.
The add_device.php script is run with this IP address, with the output filtered
to only the line containing the ID using the grep
command. The output of this is fed into a sed
script that only prints numbers. This value is saved in a variable called
$DEVICEID.
With the device ID stored, adding a graph is as simple as calling the add_graphs.php script. Note that the CPU graph is the simplest case and that some other types of graphs require more parameters.
With the add_to_cacti.sh script on the Cacti server, all it takes is for the cloud server to run it. Listing 11 shows how to call the script.
Listing 11. Calling the cacti script from the cloud server
#!/bin/bash MYIP=`/usr/bin/curl -s http://169.254.169.254/2007-01-19/meta-data/public-ipv4` ssh cacti@cacti.example.com "/usr/local/bin/add_to_cacti.sh $MYIP" |
Listing 11 first calls the Amazon EC2 meta-data server to return the public IP address, and then runs the command remotely on the Cacti server.
This series has followed the migration of an application from a single server to the AWS cloud. Improvements were made incrementally to take advantage of the Amazon EC2 offerings, from launching new servers to load balancers. This final article looked at managing a dynamic cloud environment and offered some patterns for you to use.
Given the low cost of entry to using cloud resources, you should have a look and try to conduct a practice migration. Even if you decide not to run the application in production using the cloud, you will learn a lot about what can be done in the cloud and perhaps improve your systems management skills.
Learn
-
LPI exam 301
prep, Topic 306: Capacity planning (developerWorks, April 2008)
explains in detail how to monitor systems and measure results.
-
The S3 Cookbook by Scott
Patten is a PDF from Leanpub that explains
how to use Amazon S3 with Ruby. The book goes through about 60 problems and
explains how to solve each with code.
-
In the Cloud Computing area on
developerWorks, get the resources you need to develop and deploy applications
in the cloud and keep on top of recent cloud developments.
-
In the developerWorks Linux zone,
find hundreds of how-to
articles
and tutorials, as well as downloads, discussion forums,
and a wealth of other resources for Linux developers and administrators.
-
Stay current with
developerWorks technical events and webcasts focused on a variety of IBM products and IT industry topics.
-
Attend a free developerWorks Live!
briefing to get up-to-speed quickly on IBM products and tools, as well as IT industry trends.
-
Watch developerWorks on-demand demos
ranging from product installation and setup demos for beginners, to advanced functionality for experienced developers.
-
Follow developerWorks on Twitter, or subscribe
to a
feed of Linux tweets on developerWorks.
Get products and technologies
-
Now that you've got multiple AMIs inside Amazon S3, you might want to prune some
old ones. Amazon S3 File Manager is a
Web-based file manager that rivals the features of many stand-alone
applications or browser plug-ins. If you delete an AMI, don't forget to
ec2-deregisterit. -
Capistrano is a popular
deployment package that acts in a similar manner to Rake.
-
Cfengine is the most popular configuration
management tool for UNIX®. It is lightweight and can operate on a large
number of machines.
-
Cacti is a network graphing tool built around
RRDTool. You can graph almost anything
imaginable. If it's in your data center, there's a good chance that someone has
already written a plug-in to graph it.
-
Puppet is a configuration management
tool written in Ruby and built to overcome some limitations in Cfengine. If
you're looking for a good way to start, Pulling
Strings with Puppet by James Turnbull (Apress, 2008) is a book that
the author enjoyed.
-
Evaluate IBM products
in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the
SOA Sandbox
learning how to implement Service Oriented Architecture efficiently.
Discuss
-
Get involved in the My developerWorks community.
Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Sean Walberg has been working with Linux and UNIX systems since 1994 in academic, corporate, and Internet Service Provider environments. He has written extensively about systems administration over the past several years. You can contact him at sean@ertw.com.




