Why you need lease policies
Our team within the IBM Software Group is responsible for managing a large internal cloud based on IBM Workload Deployer and IBM PureApplication System. With the increase in interest in cloud computing, we have seen an enormous increase in the number of deployments onto our cloud computing environments. However, with that increase in interest comes a chance that some of these deployments will over time become forgotten and abandoned. This situation is typical to any large cloud environment, no matter what technology is used to provision the cloud. Thus, the solution presented here to address this problem can be adapted for nearly any cloud environment, most easily for those built with IBM Workload Deployer and IBM PureApplication System.
There are many scenarios that lend themselves to the occurrence of this problem. Among them:
- Users deploy virtual machine images to try out new products and abandon the images when their experimentation is complete.
- Developers or testers deploy patterns to verify specific builds of their products. However, in a rapid build process with many builds there can be an excessive number of images to maintain with the result being that some are ignored.
- The owner of an image either leaves the project or the company and his virtual resources are abandoned.
The key factor contributing to these scenarios is that it’s very easy for users to acquire new resources. As a result, users don’t have to worry about existing VMs as long as they are able to create new ones when they want. Let’s refer to these unwanted or forgotten instances as cloud zombies because they are no longer used but will continue to take up space and resources for a long time. As long as the number of cloud zombies is small, this isn't a big problem. However, it can cause enormous harm to both users and cloud providers as the number grows large. If users are billed for their cloud utilization, then cloud zombie instances will eat up their budgets without bringing them any benefit. To make things worse, if users are not aware of these cases, the situation will continue indefinitely. Furthermore, cloud zombies consume cloud resources, which can keep cloud providers from providing more valuable services to their clients.
article described a best practice that says that you need to automate
your asset lifecycle states. Pattern and VM instances are important assets
that need to be managed, and so the suggested solution is to develop some
kind of automation to improve their governance. One potential quick and
dirty solution to the problem is to set up an auto-running
cron task to clean up instances that are up for a particularly long time, perhaps a month. While this approach will work, it has many limitations. Since it does not identify an actual cloud zombie, it will kill some innocent instances which could be vital to users’ productivity. Therefore, you need a more sophisticated deployment lease management strategy.
Pattern deployment lease policy
The pattern deployment lease policy approach we have developed applies to both virtual system pattern and virtual application pattern instances. It is also general to all environments derived from IBM Workload Deployer, including but not limited to IBM PureApplication System, IBM SmartCloud Orchestrator, IBM SmartCloud Provisioning, and IBM Workload Deployer.
The policies consist of two major concepts:
- Deployment lease days are the number of days before an instance is stopped or stored.
- Deletion grace days are the number of days before a stopped instance is deleted.
Users receive an email notification before either day arrives. At that point, users can use a self-service tool to extend the tenancy or ask an administrator to take the action. Otherwise, the instance will be stopped, stored, or deleted on the due date.
The operation applied to an instance differs based on its type. That is because different asset types have different lifecycles. The store operation is applied to virtual system patterns, while the stop operation is applied to virtual application patterns. The store operation can release reserved CPU and memory, which provides better cloud resource reuse when compared to the stop operation. Unfortunately, store is not supported on virtual application patterns.
Once you have implemented a simple policy to implement leases and grace periods, you can customize the default lease and grace settings for different purposes. For instance, for some allowed exceptions, you can mark certain virtual instances as protected. These protected instances will not have lease or grace policies enforced.
In order to achieve your management goals, you need the ability to:
- Capture and track the lease and grace day information of every pattern instance deployed in your cloud.
- Update the remaining lease and grace day information of the instances from any of a user's self-service tool, an administrator's manual action, or an automated daily decrease action.
The management infrastructure we designed is shown in Figure 1. Be aware that our architecture was developed not only to apply to the lease issue, but also to deal with security, service provisioning automation, and other issues of asset governance and lifecycle management. In the context of lease policy alone, some components can be simplified or combined together.
Figure 1. Architecture diagram
The following components are depicted in Figure 1:
- The database stores all the lease and grace information for each IBM Workload Deployer pattern instance.
- The IBM Workload Deployer management web application provides access to the database records through REST APIs. It also implements two timer tasks. One task runs daily and decreases the remaining lease days and grace days for each instance in the database. A second task runs hourly and retrieves the latest instance status from the IBM Workload Deployer APIs to update the database.
- The IBM Workload Deployer management cron enforcement jobs query the list of running, stopped, and stored instances through the IBM Workload Deployer API. It then obtains their remaining lease and grace days from the management application. Based on this information, it will then decide to either send out an email alert or stop/store/delete instances through the IBM Workload Deployer command line interface. The logic in the cron job could also have been implemented in the web application, but we found several benefits in implementing it separately. We will discuss these design considerations below.
- The lease and grace adjusting tools allow users and administrators to adjust lease and grace days through an exposed REST API with the browser.
When we considered where to implement the enforcement logic, we found that separating it into its own cron job provided these benefits:
- A cron management job enables us to leverage some commands that are available only in the IBM Workload Deployer command line interface, and not in the IBM Workload Deployer REST API. Likewise, separating this logic removes an IBM Workload Deployer command line interface dependency from the web application.
- We can develop, debug, and upgrade enforce lease and grace related logic without repeatedly updating the web application. In this way, starting/stopping/adjusting the schedule code does not create down time in the web application.
We have also organized the management-related tasks into the following parts:
In order for our system to function, it has to be aware of all running instances. There are two different ways in which we have implemented this:
- Passive discovery: As described earlier, the web application syncs every hour to all running IBM Workload Deployer instances that it manages through their REST API to find the status of existing pattern instances. It then updates that status in the database.
- Active registration: We also developed a script package that we attach to all patterns in our managed Workload Deployers. This script package runs at system creation time, and sends instance related information (such as operating system) back to the web application through our REST API, to register this instance in the database.
From the perspective of lease management alone, we could actually eliminate active registration. The additional information provided by active register is not needed by the lease policy implementation. Passive discovery alone would fulfill the requirement, but if you plan on extending your implementation for other purposes as we have ours, you may want to include it as well.
The daily timer task in the web application decreases the remaining lease or grace days for each pattern based on its current status.
As stated before, the cron job queries all managed IBM Workload Deployer instances, and iterates on each pattern instance it finds. For each instance, the enforcement cron jobs decide whether to send email alerts or stop/store/delete the instance through the command line interface as necessary.
This simplified logic is shown in Figure 2. You might notice that protected systems are skipped by the lease enforcement logic.
Figure 2. Lease enforcement logic
There are two ways to extend the lease: either by a user's self-service tool or through manual intervention by an administrator (Table 1).
Table 1. Remedy approaches
Approach How to use Extension length Self-service tool The user receives a user-facing IBM Workload Deployer usage report and can then click an embedded button that is shown next to a pattern instance to extend the lease. Default maximum length Administrator's manual intervention A user raises a user request to extend a lease out of band. An administrator then sets the lease using a REST client; normally a Firefox REST client. Customized length
There are several possible improvements in our design that we have considered for future iterations. These include:
- Leveraging the BPM engine within IBM SmartCloud Orchestrator to define and implement our deployment lease policy. Here we could use some out-of-the-box features of IBM SmartCloud Orchestrator to manage leases and reduce dependencies on our own internally developed tooling..
- We could also extend the lease policy to other artifacts types, such as patterns, script package, images, and the like. Once we have set up a simple code structure to query asset information from the IBM Workload Deployer REST API and then make decisions based on the time an asset spends in each state, it becomes easy to add additional asset types and lifecycles.
In this article we have described the problem of pattern instance and VM management with respect to “cloud zombies," described a potential solution with deployment leases, and showed an implementation of this solution.
- IBM PureApplication System product information
- IBM Workload Deployer product information
- IBM PureApplication System Information Center
- Best practices for pattern adoption in IBM PureApplication System
- Kill cloud zombies before it’s too late, Ethann Castell, Thoughts on Cloud, 2013
- IBM developerWorks WebSphere
- Follow developerWorks on Twitter.
Get products and technologies
- Get involved in the developerWorks Community