IBM SmartCloud Provisioning is designed to minimize the use of a centralized “command and control” approach, in favor of scale-out management, where endpoints can participate in management activities and do not depend on a single configuration management database.
This approach allows IBM SmartCloud Provisioning to handle multiple provisioning tasks in parallel, across an unlimited number of servers.
Cloud users can request deployments of virtual machines (VMs) and have access to the provisioned systems in very few seconds, thanks to the parallel and distributed processing that happens transparently and under the covers.
Let’s drill down into the details about this distributed management approach.
IBM SmartCloud Provisioning internally uses a peer-to-peer (P2P) messaging infrastructure to pass provisioning and management messages between agents, which contribute to the decentralized control.
Agents are installed on the compute nodes (that is, the hypervisors) and on the storage nodes, where images and volumes reside.
The P2P connections between agents not only allow self-monitoring of their health in order to implement a low-touch management infrastructure, but also allow orchestrating the communications to achieve an effective load distribution and decentralized management of the requests performed by cloud users.
The P2P communication overlay is backed by a distributed lock service, which is based on ZooKeeper.
ZooKeeper is a distributed, open-source coordination service for distributed applications; it exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration maintenance, and groups and naming. It is designed to be easy to program, and uses a data model styled after the familiar directory tree structure of file systems.
Like the distributed processes it coordinates, ZooKeeper itself is intended to be replicated over a set of servers that must all know about each other. They maintain an in-memory image of state, along with a transaction logs and snapshots in a persistent store.
IBM SmartCloud Provisioning agents connect to a single ZooKeeper server. Each agent maintains a TCP connection with the ZooKeeper server, through which it sends requests, gets responses, gets watch events, and sends heart beats. If the TCP connection to the server breaks, the agent will connect to a different server.
When a deployment request is received by IBM SmartCloud Provisioning, the request is processed by the web services layer, passed to the management infrastructure, and managed by the agents and the ZooKeeper services.
The following steps describe, in more detail, the internal communications, as depicted in the figure. This processing happens in a transparent way for the user, who sees only the deployment request being served in a few seconds.
- The web services layer takes a deployment request in charge (for example, deploy 50 “Large” instances of image “LOB123-RHEL 6.0”), and triggers a first interaction with the ZooKeeper server to ask which agent in the compute nodes layer can take this request into account.
- The ZooKeeper server selects one of the available leaders in the compute nodes layer and returns this information back to the web service layer. The role of the selected leader is to initiate an internal hand-shaking among the compute nodes agents to process the incoming request.
- The web service layer receives the information about which agent to contact, and opens a connection to that agent, passing the deployment request details.
- The selected agent takes care of the request and starts a “discussion” phase with all the other leaders (one for each rack) to distribute the load of the incoming request among all the agents that can provide resources to fulfill it. This steps happens by using the P2P connection between agents.
- Inside each rack, the leader triggers a parallel P2P interaction with all the agents on all the compute nodes included in that rack, to understand which agent can serve a portion of the incoming request. Each agent having enough free resources to serve “Large” instances answers the request coming from its leader, so that at the end of the hand-shaking process each leader knows which portion of the incoming request can be processed by which agent.
- At this point, each involved agent knows which part of the incoming request it is supposed to process. To start the real deployment step, the agent asks the ZooKeeper server where to find the “LOB123-RHEL 6.0” image to be deployed, according to the incoming request. The ZooKeeper again answers the incoming requests by providing one of the available agent leaders on the storage nodes layer.
- When an agent receives back the information about which storage node to connect to, it opens a P2P connection with the related agent and asks for the image it needs to fulfill the deployment request.
- The storage node agent leader starts, in turn, a P2P communication with the other leaders asking for the selected image. Each leader inside its managed rack triggers other P2P connections to ask each managed agent if it has a copy of the requested image
- The storage leader initiating the request collects back all the details about agents having a copy of the requested image and selects at least two of them (default redundancy required by IBM SmartCloud Provisioning), returning the information to the calling compute-node agent. The compute-node agent at this point can access the image and start the deployment of VMs, according to its capacity and to the amount of work it offered to serve.
As I said, this processing happens transparently in a very fast way and the user does not have to worry about any of the steps listed.
It allows reaching high levels of parallelism, decentralized management, and also scale-out capabilities that can be easily reached by increasing the number of servers.
If you’re interested in trying the IBM SmartCloud Provisioning distributed management capabilities, you can download a trial version from the following link: