Agile DevOps: Transient environments

Managing the illusion of infinite capacity to reduce environment scarcity

Often, after a shared environment is provisioned, it's never decommissioned and might run for weeks or months, with engineers applying manual configuration changes throughout its lifetime. This risky approach regularly causes deployment problems and other strange "environment" errors to occur during development, test, and production cycles. This Agile DevOps installment explains how to create ephemeral environments that are terminated on a frequent basis. Once all environments are scripted and versioned, these test environments are only used long enough to run through a suite of tests as the software moves through a delivery pipeline on its way to production.

Share:

Paul Duvall, CTO, Stelligent

Paul DuvallPaul Duvall is the CTO of Stelligent. A featured speaker at many leading software conferences, he has worked in virtually every role on software projects: developer, project manager, architect, and tester. He is the principal author of Continuous Integration: Improving Software Quality and Reducing Risk (Addison-Wesley, 2007) and a 2008 Jolt Award Winner. He is also the author of Startup@Cloud and DevOps in the Cloud LiveLessons (Pearson Education, June 2012). He's contributed to several other books as well. Paul authored the 20-article Automation for the people series on developerWorks. He is passionate about getting high-quality software to users quicker and more often through continuous delivery and the cloud. Read his blog at Stelligent.com.



09 October 2012

In economics, scarcity is the fundamental problem of "having humans [with]... unlimited wants and needs in a world of limited resources" (see Resources). When resources are scarce, people compete for access to them. Competition for resources is evident when it comes to people getting access to environments on traditional software projects.

The beauty is that thanks to hardware commoditization, virtualization, and cloud computing, this competition can be greatly diminished when the appropriate patterns and practices — such as transient environments — are used on a project. Transient environments are short-lived environments that are terminated on a frequent basis. To be clear, the scarcity never vanishes, but you experience the illusion of infinite capacity. When applying the transient environment pattern, you'll start forgetting that it's even an illusion.

About this series

Developers can learn a lot from operations, and operations can learn a lot from developers. This series of articles is dedicated to exploring the practical uses of applying an operations mindset to development, and vice versa — and of considering software products as holistic entities that can be delivered with more agility and frequency than ever before.

Sometimes, you'll hear these types of environments referred to by other names, including ephemeral, temporal, temporary, and disposable. These all mean essentially the same thing — that nonproduction environments are as short-lived as possible. Lately, my company has been recommending that they last no more than 72 hours — and that's on the high end.

Motivations

One of the more challenging problems in software development occurs when teams have fixed instances that no one else can alter. Often, this happens because the environment took days, weeks, or months to configure. This is an antipattern that occurs because no one took the time to script the creation of the environment. Thus, environments are scarce resources, and the competition for them is fierce. When environment lease policies do exist, they are often ignored, or the lease deadlines are extended multiple times.

What is an environment?

An environment isn't just another name for a (physical or virtual) machine. An environment is a collection of system resources including — but, not limited to — instances (physical, virtual, or machine abstractions), network configuration, software servers and configuration (for each of the instances), load balancers, and other resources that are treated as a logical unit. You might instantiate environments based on templates or other configuration.

Most projects I've seen don't have environment lease policies — or they are very loosely defined and often violated. For the ones that do have lease policies, environments require the manual installation of tools, data, and configuration — after the environment has been created. This makes each and every environment unique and, therefore, more difficult to manage, because hundreds of environments might get provisioned on larger enterprise projects. In that case, there's no simple approach to getting back to a baseline for the environment. Moreover, no team member knows how to get it back to that baseline state. As a result, team members become reluctant to terminate — or even modify — these environments. This antipattern makes it prohibitively more expensive to create and terminate environments.


Features

With transient environments, all environments are ephemeral except for production (although there are effective ways to make production environments ephemeral too). Although this might vary by project, the heuristic is that these environments exist for only enough time to run through a suite of automated and exploratory tests. The key prerequisites for transient environments is that they be scripted, tested, and versioned. Ideally, you should be using an infrastructure automation tool such as those I discuss in "Agile DevOps: Infrastructure automation."

The key features that make up transient environments are:

  • Scripted environments: They are fully scripted, versioned, and tested.
  • Self-service environments: Any authorized person on the team can launch a new environment.
  • Automatic termination: Environments are automatically terminated based on the team policy. Team members have no option to override the policy.

Once you have a fully scripted environment, you can enable authorized team members to obtain it in a self-service manner. With the freedom to simply launch and terminate environments on demand comes responsibility. This responsibility is reinforced by defining termination policies and enforcing those policies through automated processes that terminate the environments on a regular basis. (I will cover test-driven infrastructures and versioning in future articles in this series).


Benefits

Get involved

developerWorks Agile transformation provides news, discussions, and training to help you and your organization build a foundation on agile development principles.

By defining transient-environment policies and automating the implementation of those policies on your projects, you can reduce the proliferation of unique environments, support self-service deployments, increase automation of environment instantiation, move toward a culture of environments as commodities, allow for test isolation, and significantly reduce the amount of troubleshooting in environment-specific problems. Some of the key benefits are:

  • Reduce environment dependency: Reduce the dependency that your team has on any one particular environment by providing the capability to launch and terminate them at will.
  • Better resource utilization: By terminating environments that are no longer being used, you free up capacity for others.
  • Knowledge transfer: When team members know that their environments will be terminated on specific times, automation becomes the only solution to the institutional knowledge of how the environment gets configured.

How it works

The nice thing about transient environments is that it's a rather simple pattern to implement once your environments are fully scripted, versioned, and tested. At that point, you have three primary tasks to perform:

  • Create a team policy: In collaboration with your team members, determine your team policy based on your project requirements. I recommend starting aggressively and regularly reducing the number of hours these environments live — to about 72 hours.
  • Automate environment termination: Write a script that terminates all environments that exceed the team lease policies.
  • Schedule environment termination: Schedule a process to run on a regular basis that executes the environment-termination script.

Base your team policy on the time it takes to run through all of the required testing.

To schedule environment termination, you can start by using a scheduler such as cron or — if you're using Java — Quartz (see Resources). You can also use the scheduler provided by your Continuous Integration server to run a job at a regular time every day. This example shows a simple cron expression that runs a script once a day at 2:15 a.m.

0 15 02 * * /usr/bin/delete_envs.sh

The next example uses the command-line interface provided by Amazon Web Services (AWS) CloudFormation to terminate an environment as defined by a CloudFormation stack:

/opt/aws/apitools/cfn/bin/cfn-delete-stack --access-key-id $AWS_ACCESS_KEY \
--secret-key $AWS_SECRET_ACCESS_KEY --stack-name $current_stack_name --force

A script like this can be expanded to loop through an environment catalog and terminate all associated resources.

By defining an aggressive team policy, scheduling a process, and automating the termination of environments, your team can proactively manage resources and reduce the chance that environments the project relies upon exist for weeks or months.

Troubleshooting

How does environment troubleshooting usually work on most projects? In my experience, it's a painful slog of determining what got changed, who changed it, and why. Often, several people investigate the problem to determine the proper remedy. The problem is often replicated because each environment is unique — because unique modifications are made to it as it runs for weeks or months.

Alternatively, with a transient-environment policy — based upon scripted, versioned, and tested environments — you get the environment into a known state. To do this, you launch a new environment and apply changes to determine its effect. Then, you write automated tests and scripts and then version the changes. Because effective change management is in place, you can always get back to a known state to make changes, rather than wasting hours or days determining what got changed in a dynamic environment modified by myriad users. This is the essence of having a canonical environment.


A transitory stay

In this article, you learned that agile DevOps environments are as short-lived as possible — as little as a few hours and as much as a few days. By defining a policy and scheduling automated termination of environments, you reduce the dependency on a limited number of unique environments, better utilize resources, and encourage automation so that environments can be launched and terminated on demand.

In the next Agile DevOps installment, you'll learn about creating an environment that fails constantly — paradoxically, for the purpose of preventing failure. In it, I'll cover Chaos Monkey, a tool developed by the Netflix tech team that intentionally and randomly, but regularly, terminates instances in the Netflix production infrastructure to ensure that the systems continue to operate in the event of failure.

Resources

Learn

Get products and technologies

  • Quartz: Quartz is an open source job-scheduling service.
  • IBM Tivoli® Provisioning Manager: Tivoli Provisioning Manager enables a dynamic infrastructure by automating the management of physical servers, virtual servers, software, storage, and networks.
  • IBM Tivoli System Automation for Multiplatforms: Tivoli System Automation for Multiplatforms provides high availability and automation for enterprise-wide applications and IT services.
  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.

Discuss

  • Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.
  • The developerWorks Agile transformation community provides news, discussions, and training to help you and your organization build a foundation on agile development principles.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into DevOps on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=DevOps, Cloud computing, Java technology
ArticleID=839161
ArticleTitle=Agile DevOps: Transient environments
publish-date=10092012