Where and why: Demystifying virtual machine placement in IBM PureApplication System
You've heard that your IBM® PureApplication® System contains an intelligent placement engine that helps to optimize the use of licenses, CPU, and memory. You've also heard that this engine helps to ensure the high availability of your deployments if a compute node is lost. But perhaps you've always wondered just what lies behind the placement decisions: why did the placement engine choose to locate this virtual machine here, and that virtual machine there? Why did a virtual machine move from one compute node to another—or why didn't it?
This article examines the PureApplication System placement engine, and briefly compares and contrasts how this placement engine works in IBM PureApplication Software and IBM PureApplication Service. You learn about the two different tiers of placement and the role each plays in balancing and rebalancing your workload. You also learn how placement decisions are made when you first deploy your pattern, when your pattern scales, and when other changes occur in your system. These changes can include the loss of a compute node or the coming and going of other deployments.
The two-tier placement engine
PureApplication System has a two–tier placement engine that is used to place and rebalance virtual machines for virtual system and virtual application pattern instances. Figure 1 illustrates the role that the two tiers of placement serve in the platform as a service (PaaS) and infrastructure as a service (IaaS) layers of PureApplication System. It illustrates the possibility of deploying across two systems by using multisystem deployment, and how the tiers of placement come into play on each system.
Figure 1. Role of placement tiers in deploying a pattern across multiple locations
The first tier of placement occurs when you initially deploy your pattern. This tier is used to select which locations (PureApplication Systems), cloud groups, and IP groups your virtual machines are deployed to. After those selections are made for the individual virtual machines, the virtual machines cannot move except within their original cloud group (see Second-tier placement).
Virtual machines of externally managed deployments can span different cloud groups
If you deploy to an internally managed environment profile, then all virtual machines must be deployed to the same cloud group. You select this cloud group on the initial deployment screen, as shown in Figure 2.
Figure 2. Cloud group selection for deploying to an internally managed environment profile
However, if you deploy to an externally managed environment profile, then you do not choose a single cloud group for all virtual machines (Figure 3).
Figure 3. No cloud group selection for deploying to an externally managed environment profile
Instead, your virtual machines can be distributed across cloud groups and even across multiple PureApplication System locations, as shown in Figure 4.
Figure 4. Distribution of virtual machines across cloud groups and locations
The first tier of the placement engine comes into play at this stage. It calculates a possible distribution of your virtual machines across the locations and cloud groups. If you are comfortable with the placement decision, click Quick Deploy to accept the placement recommendation. However, if you'd like a chance to review and possibly modify the placement recommendation as shown in Figure 4, click Prepare to Deploy to review the recommendations. Figure 5 shows these deployment choices.
Figure 5. Deployment buttons
After you deploy the chosen placement of machines, the first tier of the placement engine revalidates the placement (for example, ensuring that there are enough IP addresses or compute resources to satisfy the request). The deployment is rejected at this stage if there is insufficient capacity.
If you are deploying a pattern across multiple PureApplication System locations, the first tier of placement also verifies that all of the necessary artifacts (images, script packages, add-ons, and system plug-ins) are present and at the matching version on each system. First-tier placement rejects attempts to deploy components to another system if that system is missing a component, as shown in Figure 6.
Figure 6. Artifact checking across multiple locations
The first tier of placement builds its recommendation independently for each virtual machine component in the pattern that is being deployed. Within each component, it tries to maximally distribute the virtual machines evenly across all cloud groups in the environment profile (regardless of location) if there is sufficient capacity. As a secondary consideration, when the choice is between cloud groups with an equal number of virtual machines, this tier of placement prefers the location with fewer virtual machines, if any. For example, a WebSphere® Application Server custom node that contains four instances based on its Base Scaling Policy is distributed as much as possible among the locations and cloud groups. However, the WebSphere deployment manager (dmgr) node is placed independently of the custom nodes and without any consideration given to them. Therefore, if you have several singleton nodes in your pattern (such as a dmgr and a DB2® node) it is equally possible that they are placed in the same location or in different locations.
When the placement is validated and the deployment is submitted to each individual location, then the second tier of placement is used to determine the individual compute nodes within each cloud group that are used to place each virtual machine. This tier of placement enforces high availability goals for a cloud group (see VM high availability). Therefore, a deployment can be rejected if that deployment would prevent a cloud group from achieving high availability. Within each cloud group, the second tier of placement attempts to equally spread (anti–collocate) all virtual machines from the same deployment that have the same CPU and memory settings across the compute nodes in that cloud group. Additionally, if the system is configured to enforce licensing, this tier of placement strictly ensures that license entitlements are not exceeded, even if that means forcing virtual machines onto the same compute node or even failing a deployment. Because the placement engine is simultaneously optimizing to meet various goals, it is not unusual for the distribution of virtual machines, or the allocation of memory and CPU, to appear uneven among the compute nodes in a cloud group.
Rebalancing virtual machines across compute nodes
Every 5 minutes a rebalancing job is scheduled on your PureApplication System. This job runs in the second tier of placement, and compares the current allocations and usage of virtual machines across the compute nodes in each cloud group. If the rebalancing job is able to achieve a better balance of virtual machines across the compute nodes by moving them, then some virtual machines are scheduled for live migration from their current compute node to another compute node. Figure 7 shows the internal rebalance job on the job queue.
Figure 7. System job queue that shows internal rebalance job
The rebalancing job adjusts the allocation of virtual machines across compute nodes as follows:
- If any virtual machines are deleted or stored, freeing up space on compute nodes, the rebalancing job might migrate virtual machines if 1) the migration better satisfies anti–collocation goals or 2) the migration results in recovery from a situation where license usage is exceeding entitlement.
- If the cloud group is configured to allow overallocation of CPU (that is, the cloud group's Type is set to Average), the rebalancing job might migrate virtual machines if it is able to make better use of the CPU across the cloud group. If one compute node has virtual machines that are not receiving their full allocation and are spending time waiting for CPU, while another compute node has spare CPU capacity, then the rebalancing job migrates virtual machines from the busy node if the rebalancing does not introduce contention on the destination compute node. The rebalancing job takes into account the virtual machine activity over the last several minutes.
If several virtual machines are candidates for migration, virtual machines of lower priority are given preference. Figure 8 shows the deployment priority indicator in the virtual system or virtual application instance view. You can change the priority of a deployment at the time it is deployed and also after deployment by clicking Change.
Figure 8. Deployment priority
PureApplication Service and PureApplication Software
PureApplication Service and PureApplication Software offerings differ from PureApplication System in that they do not currently support multicloud or multilocation deployment. As a result, the first tier of placement is not applicable to them because there is never a choice to be made between multiple locations or cloud groups. However, the second tier of placement applies fully to PureApplication Service and Software.
PureApplication offers several ways that you can customize your system. You can optionally configure your system for high availability, choose between automatic horizontal scaling or manual scaling, or create a placement policy that influences the actions of first-tier placement. Before you implement any of these advanced configurations, you need to understand the behavior of the placement engine when these configurations are implemented. Also, important restrictions exist when you deploy GPFS clusters.
VM high availability
You can optionally configure your cloud groups to reserve resources for high availability if a compute node fails. You have several options for doing so, as shown in Figure 9.
Figure 9. Cloud group options for high availability
If you choose to reserve resources at the system level, then you must designate one or more compute nodes as spares for the entire system. If a compute node in your cloud group fails, the spare compute node is automatically added to your cloud group, and the failed virtual machines are restarted on the recovery compute node.
If you choose to reserve resources at the cloud group level, then one compute node's worth of capacity is reserved within your cloud group and is not available for deployments. If a compute node in your cloud group fails, the failed virtual machines are migrated into the reserved space on the remaining compute nodes in the cloud group and restarted.
Note that the second tier of placement calculates a specific amount of headroom to reserve on each individual compute node based on how many compute nodes are present in the cloud group. For example, if there are four compute nodes, one quarter of the CPU and memory on each compute node is set aside for high availability. The placement engine ensures that new deployments preserve this headroom, and the rebalancing job can also migrate virtual machines from one compute node to another as follows:
- To preserve this headroom on every compute node (for example, if you have increased the CPU or memory allotment for a virtual machine)
- To restore the headroom (for example, after recovering a failed compute node or restarting a compute node that was previously placed in maintenance mode)
If you do not reserve resources, you might still have spare capacity in your cloud group if a compute node fails. The system attempts to migrate as many virtual machines as possible into the remaining compute nodes in the cloud group, in order of their priority. To change priorities, use the deployment priority indicator shown in Figure 8.
You can see at a glance the high availability status of all cloud groups in your system by accessing the Cloud | High Availability page (Figure 10).
Figure 10. System high availability summary
Whether you are using automatic horizontal scaling or manual scaling, you need to consider the actions of the placement engine when your instance scales its virtual machines.
When a new virtual machine is scaled out, the first tier of pattern placement is invoked to select the cloud group and location into which the new virtual machine is scaled. In PureApplication System V2.1, new virtual machines can be scaled out only in cloud groups where there is already an instance of the same virtual machine. Beginning with PureApplication System V2.2, new virtual machines can be scaled to any location or cloud group that satisfies placement checks for artifacts and capacity, even if they contain no existing virtual machine instances. In either case, the first tier of placement tries to select the cloud group or system with the fewest virtual machines and with available capacity, or else it selects a cloud group at random.
Then, when the virtual machine is started, the second tier of placement is invoked to select the compute node on which the virtual machine is run. As much as possible, the second tier of placement attempts to distribute this new virtual machine equally among the compute nodes with all other virtual machines in the same deployment that have the same CPU and memory settings.
When a virtual machine is removed due to either automatic or manual scaling, the placement engine is not invoked to select the virtual machine. Instead, the scaling agent selects a virtual machine by using the following sequence of steps:
- Search for a virtual machine in non–running state.
- Search for a virtual machine that has a role in non–running state.
- Search for the virtual machine that has the lowest CPU utilization.
- Select a virtual machine at random if their CPU utilization is the same.
However, the agent skips a virtual machine if that virtual machine is the master virtual machine for the deployment. The master virtual machine is the virtual machine that hosts the Instance Console of the pattern instance.
To determine which virtual machine is the master, click Manage to access the management UI for the deployment. Note the IP address in your browser; this address is the address of the virtual machine that is the current master for the deployment.
You can influence the decision process of the first tier of placement by adding a Placement Policy to your virtual machine in the pattern editor (Figure 11).
Figure 11. Placement policy component
The placement policy influences how the first tier of placement attempts to place a scaled set of virtual machines. Recall that the default behavior is for the first tier of placement to anti–collocate the instances of a single virtual machine component across cloud groups and locations. By using the placement policy, you can specify a hard or soft collocation or anti–collocation of these virtual machines across locations (systems) or cloud groups. If your constraint cannot be satisfied, a hard constraint prevents the pattern from being deployed, while a soft constraint allows the pattern to continue to deploy even though the constraint is not met.
Note that the placement policy cannot be used to specify placement behavior between separate virtual machine components in the pattern editor. It can be used only to specify behavior among the scaled instances of a single virtual machine. This policy influences the placement of the virtual machines both at initial deployment and also at the time of any horizontal scaling.
General Parallel File System (GPFS) and shared block storage
On PureApplication Intel® environments (including PureApplication System, PureApplication Software, and PureApplication Service) you can use the IBM Pattern for GPFS™ to deploy a multi-node GPFS cluster. The cluster can contain multiple GPFS server virtual machines that share a single block storage volume for improved performance and availability. On IBM POWER® environments, PureApplication System and PureApplication Software (V2.2 and greater) support multi-node GPFS clusters.
The GPFS pattern encodes special placement instructions in its topology to ensure that the GPFS virtual machines are (1) forcibly distributed across different compute nodes, and (2) pinned to those compute nodes so that they are not migrated even if a compute node fails. These instructions absolutely ensure high availability for the GPFS cluster. In a three-node cluster, if any of the virtual machines are placed on the same compute node, then the cluster is no longer available if that compute node fails.
These special placement instructions cannot be used in the virtual system pattern editor. If you want to use shared block storage in your own patterns, for example for DB2® pureScale®, Oracle®, Real Application Clusters (RAC), or Microsoft® Windows® clustering, contact IBM for assistance in coding these special placement instructions into your pattern topology.
Disabling automatic rebalancing and manually migrating virtual machines
Customers occasionally express interest in disabling the automatic rebalancing capabilities of the second tier of PureApplication placement. You might want to disable automatic rebalancing if, for example, you have a high availability clustering solution and want to manage the placement of your virtual machines manually. PureApplication System provides REST APIs that can be used for this purpose. The Resources section contains information on disabling automatic rebalancing and manually migrating your virtual machines.
IBM recommends that you keep rebalancing enabled and allow the PureApplication placement engine to manage the placement of your virtual machines.
In this article, you learned how the placement engine works in PureApplication System, and how its role differs in PureApplication Software and PureApplication Service. You learned how the two tiers of placement decide how to locate virtual machines during the initial deployment of your pattern, and also when your pattern later scales to add or remove virtual machines. The article described the ongoing decisions that the lower tier of placement makes to rebalance virtual machine usage. It also described the various options for planning for high availability, and how the lower tier of placement achieves this high availability if a compute node fails.
- The Clusters REST API documents how you can disable automatic placement rebalancing.
- The Instance migration REST API documents how you can cause a virtual machine to be migrated from one compute node to another.
- More PureApplication articles in the developerWorks Technical Library