SmartCloud Provisioning is designed to minimize the use of a centralized “command and control” approach, in favor of scale out management, where endpoints can participate in management activities and do not depend on a single configuration management database.
This allows SmartCloud Provisioning to handle multiple provisioning tasks in parallel, across an unlimited number of servers.
Cloud users can request deployments of virtual machines and have access to the provisioned systems in very few seconds, thanks to the parallel and distributed processing that happens transparently and under the covers.
Let’s drill down into the details about this distributed management approach.
SmartCloud Provisioning internally uses a peer to peer (P2P) messaging infrastructure to pass provisioning and management messages between agents, which contribute to the decentralized control.
Agents are installed on the compute nodes (i.e. the hypervisors) as well as on the storage nodes, where images and volumes reside.
The P2P connections between agents not only allow self-monitoring of their health in order to implement a low-touch management infrastructure, but also allow orchestrating the communications to achieve an effective load distribution and decentralized management of the requests performed by cloud users.
The P2P communication overlay is backed by a distributed lock service, which is based on ZooKeeper.
ZooKeeper is a distributed, open-source coordination service for distributed applications, which exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration maintenance, and groups and naming. It is designed to be easy to program, and uses a data model styled after the familiar directory tree structure of file systems.
Like the distributed processes it coordinates, ZooKeeper itself is intended to be replicated over a set of servers that must all know about each other. They maintain an in-memory image of state, along with a transaction logs and snapshots in a persistent store.
SmartCloud Provisioning agents connect to a single ZooKeeper server. Each agent maintains a TCP connection with the Zookeeper server, through which it sends requests, gets responses, gets watch events, and sends heart beats. If the TCP connection to the server breaks, the agent will connect to a different server.
When a deployment request is received by SmartCloud Provisioning, the request is processed by the Web Services layer, passed to the management infrastructure, and managed by the agents and the ZooKeeper services.
The following steps describe in more details the internal communications, as depicted in figure 1 below.
This processing happens in a transparent way for the end user, who just sees the deployment request being served in few seconds.
- The Web Services layer takes a deployment request in charge (e.g. deploy 50 “Large” instances of image “LOB123-RHEL 6.0”), and triggers a first interaction with the ZooKeeper server to ask which agent in the compute nodes layer can take this request into account.
- The ZooKeeper server selects one of the available leaders in the compute nodes layer and returns this information back to the web service layer. The role of the selected leader will be to initiate an internal hand-shaking among the compute nodes agents to process the incoming request.
- The Web Service layer receives the information about which agent to contact, and opens a connection to that agent, passing the deployment request details.
- The selected agent takes care of the request and starts a “discussion” phase with all the other leaders (one for each rack) in order to distribute the load of the incoming request among all the agents that could provide resources to fulfill it. This happens leveraging the P2P connection between agents.
- Inside each rack, the leader triggers a parallel P2P interaction with all the agents on all the compute nodes included in that rack, to understand which agent can serve a portion of the incoming request. Each agent having enough free resources to serve “Large” instances answers the request coming from its leader, so that, at the end of the hand-shaking process each leader knows which portion of the incoming request can be processed by which agent.
- At this point, each involved agent knows which part of the incoming request it is supposed to process. To start the real deployment step, the agent asks the ZooKeeper server where to find the “LOB123-RHEL 6.0” image to be deployed, according to the incoming request. The ZooKeeper again answers the incoming requests by providing one of the available agent leaders on the storage nodes layer.
- When an agent receives back the information about which storage node to connect to, it opens a P2P connection with the related agent and asks for the image it needs to fulfill the deployment request.
- The storage node agent leader starts in turn a P2P communication with the other leaders asking for the selected image. Each leader inside its managed rack triggers other P2P connections to ask each managed agent if it has a copy of the requested image.
- The storage leader initiating the request collects back all the details about agents having a copy of the requested image and selects at least two of them (default redundancy required by SmartCloud Provisioning), returning the information to the calling compute-node agent. The compute-node agent at this point can access the image and starts the deployment of VMs, according to its capacity and to the amount of work it offered to serve.
As I said, this processing happens under the covers in a very fast way and the user does not have to worry about any of the steps above.
This allows reaching high levels of parallelism, decentralized management, as well as scale-out capabilities that can be easily reached by increasing the number of servers.
If you're interested in trying the SmartCloud Provisioning distributed management capabilities, you can download a trial version from the following link:
SmartCloud Provisioning is an infrastructure-as-a-service cloud able
to work with different types of hypervisors. You can easily install
and configure new compute nodes to run your virtual images on KVM,
VMWare and Xen.
is a very interesting sentence, and it seems to be very useful. First
time I read it, I thought: “Do I need to have 3 different images?
Can I have same image running on any hypervisor?” Answers are yes
to both question. Depending on how you would run your image you could
need have different images for different hypervisors or just use an
single image regardless the underlying hypervisor.
going deeper on how IBM SmartCloud Provisioning deploys virtual
images, I would discuss different hypervisors. Each of them has its
own peculiarity, allowing you to leverage different features,
implemented in different ways. This lead us to deal with different
hypervisor limitations. Here the following are most common
and Xen are able to manage SCSI devices, but not KVM
and Xen can use virtio drivers but not VMWare
uses a proprietary agent inside the guest OS (VMWare tools) which
does not work with Xen or KVM
uses vmdk file format, which is a proprietary format
of these differences can prevent an image from working on any
hypervisor. It is clear that if you do not pay attention on how you
create your base images, you might need different images for the
different hypervisors. So next step is understanding how we should
create a “magic image” able to run everywhere.
point is to figure out list of similarity between different
any hypervisor type support the raw format.
type: any hypervisor type supports ide device.
configuration: hypervisors do not require specific configurations,
but the manager could.
with IBM SmartCloud Provisioning you will not have any issue from any
of the previous points. In fact before creating a base image you
should just follow a few rules to ensure portability.
requires specific OS configuration regardless the underlying
hypervisor. You can find all needed information on how to build your
image at the info center site:
is important to use raw format, for the initial image. Here we have
an interesting problem: how to create a VMWare image in raw format.
The answer is very simple: we are creating a fully portable image, so
you can use KVM to build such master image and than run it
this point we have our raw image, fulfilling all requirements from
the hypervisor manager. What is next step? You need to register it
into IBM SmartCloud Provisioning. To do that you can use either the
administrative UI or CLI. Regardless the user interface you are using
just remember to use following settings during registration:
not enable virtio
finally have a fully portable image. IBM SmartCloud Provisioning will
decide by itself which will be the most appropriate compute node to
run your “magic image”.
though the described process is very easy, there could be some cases
where you cannot follow it. This is just in case you already have
your image in a proprietary format, and you need to use them. In this
case you have Virtual Image Library helping you. It is a very useful
IBM SmartCloud Provisioning component able to manage images
federating different hypervisors. It has capability to check image
into its own repository so that you can then check them out to a
different federated virtualization environment. And during this
process it will convert the image format for you.
it you will be able for example to check in a VMWare image and then
check the same image out to IBM Smartcloud Provisioning. Resulting
in a raw format image. Next interesting question is if it will run or
not. The answer strongly depends on compute node type and image
configuration. For what previously discussed, you should care about
configuration: as I said IBM SmartCloud Provisioning requires images
to have some OS configuration. To have final working image you must
ensure that the initial VMWare image has all required configuration
before stating importing it into Virtual Image Library. Otherwise
it will not able to start (for example if image does not have DHCP
configured, it will never get a valid IP)
type: if you only have KVM compute node within your IBM SmartCloud
Provisioning an image using SCSI device will not be able to run at
all. To have it running you must have at least one VMWare compute
node. If initial image is using an ide device, than you will not
have any trouble.
addition to image format conversion Virtual image Library is also
able to modify Windows device driver. In the process of moving an
image from VMWare to Virtual Image Library and than to IBM SmartCloud
Provisioning, the application change Windows configuration allowing
it to run into any hypervisor.
information about previous topics can be found at IBM info center
I really liked the post Rapid deployments with IBM Smart Cloud Provisioning
that explains how simple and fast is to deploy instances using SmartCloud Provisioning.
But once the instances are deployed the next questions are:
- How can I "easily" manage them from patch management point of view ?
- How can I "ensure" that they satisfy my corporate and security standards ?
The solution is to integrate SmartCloud Provisioning with Tivoli Endpoint Manager (TEM) so that all the running instances will be connected to the TEM Server and managed according the configured security and corporate standards
It can be achieved exploiting the current integration between SmartCloud Provisioning and Image Construction and Composition Tool (ICCT) available in SmartCloud Provisioning version 1.2 performing the following steps:
- Using ICCT create a new bundle, the "TEM Agent bundle", that contains:
Extend an OS base image available in SmartCloud Provisioning adding the "TEM Agent bundle".
- TEM Agent installation package
- TEM masthead file.
this file is the digitally signed file that contains the information of where the TEM server is located
- a script that installs the TEM Agent and copies the TEM masthead file in the proper directory (ex: for Linux is /etc/opt/BESClient )
In this way a new image will be available in SmartCloud Provisioning with the TEM agent installed and configured to connect to the TEM Server.
After doing that when the extended image is launched the TEM agent will automatically start and connect to the TEM Server without requiring any user action.
Then from the TEM console you will be able to see and manage it performing actions and/or downloading fixlets.
This is just the basic integration and more advanced scenarios can be implemented, like for example exploiting the OVF parameters (as described in the topic Customizing virtual images with IBM SmartCloud Provisioning
) for configuring and grouping the TEM Agents but they will be described in my next blogs !
For further information on IBM SmartCloud Provisioning and Image Construction and Composition Tool see IBM SmartCloud Provisioning Infocenter
If you would like to try out IBM SmartCloud Provisioning 1.2 core functionalities but you are worried you do not have time to spend installing it or you do not have enough hardware, you can download a special demo package from Integrated Service Management Library
It gets installed in a single physical box.
The system must use x86_64 processors that support virtualization.
In addition to that you need at least 3 GB memory and 30 GB disk space
The required operating system for this installation is Linux CentOS 6.0 64 bits.
In addition to that the following packages are required:
Then the installer configures the physical box as compute node, storage node, pxe server and dhcp server, then it creates a virtual image (the hypervisor is KVM) that acts as second storage node, webconsole, web-adminconsole, webservice, rest server, hbase and zookeeper.
Further installation details are available in the readme downloadable with the package.
I've been impressed by the speed of
provisioning a set of virtual machines in just a few tens of seconds
using IBM Smart Cloud Provisioning. In most cases you can get a
running virtual machine in less than one minute.
The Smart Cloud Provisioning technology
has been devised and particularly optimized for managing the
following cloud infrastructure scenarios:
Infrastructure composed of
High level of standardization with
a relative small set of master images used to provision many
instances from the same image
Typical life cycle of the
provisioned resources with short average time of life of provisioned
Many other workloads can be deployed
and easily automated on top of Smart Cloud Provisioning. For example,
traditional stateful applications can be easily deployed for simple
HA solutions. Anyway you get the maximum performances from Smart
Cloud Provisioning when operating in the context of the above
To achieve such high performances Smart
Cloud Provisioning has been designed focusing the attention to an
optimized virtualization infrastructure based on OS streaming: no
need to copy large image files over the network when provisioning.
Image copying is the single biggest
bottleneck in VM provisioning today both in terms of CPU, memory, I/O
and bandwidth usage. In traditional Cloud provisioning approaches all
of this overhead is system resource that is just pure overhead
(nobody builds a Cloud to provision systems - provisioning is an
overhead that is required to have systems on which business workload
is deployed, and any overhead is in conflict with the business
The key element of such infrastructure
are the so called ephemeral instances, that are virtual machines
having no persistent state. Once they get terminated all the data
associated with them is deleted as well. They are clones of a master
image and these clones will have a primary virtual disk which is
ephemeral: when the instance goes, so does its ephemeral storage
(mechanisms exist in Smart Cloud Provisioning to provide persistence,
if needed by some scenarios).
When creating a new instance, since
master images are read-only resources and are replicated across the
storage cluster, Smart Cloud Provisioning uses the Copy-on-Write
(CoW) technology and the iSCSI protocol to stream them avoiding
expensive copying. Each iSCSI session results in a valid block device
to be created in the host OS. Of course each guest OS (corresponding
to a given instance) requires a writable block device representing
the main disk of the system. All supported hypervisors have a storage
virtualization layer which includes the Copy-on-Write technology. For
example, KVM's qcow2 files can be configured to implement CoW
by referencing a backing storage device. VMWare has something called
redo files which effectively do the same thing as well. In each case,
the hypervisor can natively use the CoW file referencing the iSCSI
block device to expose a virtual block device to the virtual machine. Depending on the hypervisor and guest
OS this device will show up as something like /dev/sda or c:\. The CoW files are stored locally on the
hypervisor's file system. When the instance is terminated, the
Smart Cloud Provisioning agent will simply discard the CoW file and
check if any other instances are using the same iSCSI device. If the
device is no longer in use, the agent will also tear down the iSCSI
Thanks to the above infrastructure the
action of provisioning a new virtual machine results in a very fast
and reliable process that allows to create individual systems in tens
of seconds and of peak requests of 1000s of systems per hour.
If you're interested in trying the
Smart Cloud Provisioning product, you can download a trial version
from the following link:
I'm a big fan of standardization. I'm a big fan of using non-persistent images as well. They just make my life so much easier.
The only issue I see with them is the need anyway to provide to the end user some configuration and customization possibilities.
It could be something trivial like having your own screen saver, or a special keyboard and language configuration or it could be
something like connecting the softwares inside the image to some specific devices, disks or additional, external software.
I even do not want to think about having a master image for each of the possible situation. This would simply make my image catalog
so uselessly big that it would shortly become unmaneageable. I would even not mention all the possible issues I could have when I
need to upgrade a master image, I just need to do it for all the customized masater images derived from that one.
I will loose all the advantages of dealing with a cloud of non-persistent images. I'm only sliding the issue from the virtual machine
instances to the master images themselves.
A possible solution would be to have the user reconfigure his VM everytime he starts it: unbereable! ...especially if you think about
complex software stacks.
I found interesting the solution included into IBM SmartCloud Provisioning. What you can do with that is to allow the end user to specify
a set of configuration parameters at image deployment time so that the image will be automatically configured accordingly at boot time.
The idea under the cover is pretty easy: the image builder inserts in the master image a script that is run at system boot.
The script is supposed to be able to parse the information passed by the end user at VM deployment time and takes the needed action like
reconfiguring the operating system or a specific software.
All information inserted by the end user in the web user interface are saved on the compute node and then injected back into the deployed instance
If you are worried about the fact the end users might be reluctant to type in information in a specific format (a possibility is to let him
deal with free text, but then you'll get mad in parsing it) and that the process could be error prone, consider that if you use Image
Construction and Composition Tool (an optionally installable component inside IBM SmartCloud
Provisioning), the web UI gets automatically modified to show the end user the parameters you may want him to put in.
Of course if you are a lazy end user and you do not want to type in information or remember them (especially if you do it frequently),
you can type your input parameters in a file and use the command line to deploy the image passing the file as one of the deployment parameters.
Modern Cloud infrastructures are built leveraging thousands of highly distributed servers, used to provide services directly to customers over the Internet. The service provider has two extremely important objectives, which, unfortunately, are to some degree contrasting: a) ensure continuous availability of the Cloud service, and b) contain the cost of the infrastructure and administration (CAPEX and OPEX).
There are several factors that have an impact on the availability of services, mostly related to infrastructure failures. Failures are not only related to unrecoverable hardware outages, but also to recoverable OS or middleware failures.
Not so long ago, the most common approach to high availability was to assume one could deploy infrastructures with the highest Mean Time To Failure (MTTF) possible, which required expensive systems and assumed the possibility to write error-safe software applications. It was also assumed that some degree of down-time was acceptable, with vendors boasting of the number of 9's that they could support (e.g. 99.999% availability). In today's always-on Internet, any downtime of major services becomes headline news. The traditional approach is no longer applicable, and a new approach has to be considered.
Given the requirement to reduce infrastructure costs, service providers are using commodity hardware. Given also the requirement to reduce operational costs, hardware failures are commonly dealt with by directly replacing the failed component rather than manual debugging and recovery by skilled (and expensive) administrators. Thus, to maintain the objective of continuous availability of the service, the Cloud system must be built in order to expect failure of the underlying infrastructure, and not only for temporary periods but it must assume that components will disappear forever. This cannot be limited to only hardware components, as no matter how well a software element is tested, unexpected edge conditions will appear at some point-in-time. So, to guarantee continuous availability, a Cloud solution must also expect its own components to fail too.
Given that we are forced to expect failure, the high MTTF approach is no longer valid, and instead we have to increase availability by flipping the approach to minimizing Mean Time To Recovery (MTTR). The quicker the system can recover from failure, the higher the availability of the service will be. Given however that even a tiny percentage of downtime is no longer acceptable, we also need a means to maintain service availability during the recovery process. One way of doing this is through providing redundancy of all critical services within the Cloud solution.
SmartCloud Provisioning is designed according to the ROC principles, because it is based on a highly distributed, redundant and robust infrastructure, with near zero downtime, and automated recovery across heterogeneous platforms, and it does not require expensive systems, but can run on a relatively low-cost commodity infrastructure.
The key factors that allow SmartCloud Provisioning to be a low-touch and robust cloud infrastructure are the following:
the infrastructure is as stateless as possible: this avoids issues related to single points of failure
management agents are deployed on the physical nodes of the infrastructure (compute nodes and storage nodes) and are connected in a peer-to-peer network to form a self-monitoring and self-managing infrastructure
core services are redundant being deployed in clusters to tolerate individual faults
master images are replicated in multiple copies across the storage nodes in the storage cluster; this tolerates HW failures of the storage nodes in the cluster as well as network failures when accessing one copy of the image
hypervisor (compute) nodes are deployed via a stateless boot so that it becomes easier to re-deploy a failing hypervisor by simply rebooting it and getting a fresh new copy of the hypervisor image. This also allows easy deployment of new nodes if needed, to augment the capacity of the infrastructure
Let's consider some typical failure scenarios that can happen in a real environment and let's see how the SmartCloud Provisioning is designed to tolerate them and react appropriately.
First example is related to the management agents that are used by SmartCloud Provisioning to perform the standard provisioning operations.
Management agents are deployed on both the compute nodes and the storage nodes and are organized in dynamic hyerarchies, where a leader (manager) is dynamically elected. The leader is just the entry point for distributing the requests across the infrastructure and a coordinator of any operation, but this role does not imply any special information being associated with the agent itself (stateless infrastructure): any agent can be a leader.
All the agents have a watch-dog mechanism that is used to prevent, detect and correct failures; they also monitor each other in the neighborhood and can start simple actions to fix other agents issues.
So, if an agent fails, the watch-dog mechanism tries to restart it. If the watch-dog is not able to restart the agent, neighbours try some simple actions to restart the failing agent. If the agent cannot be restarted, the system keeps on working without that node, thanks to the redundant infrastructure.
If the failing agent was a leader, and it cannot be restarted, the managed agents can re-elect their leader dynamically, without losing any information.
Another example is related to failures either in a storage node or in a compute node.
If a storage node fails, thanks to the redundant deployment and to the multiple copies of the same image available in the storage cluster, the deployment of VMs can continue without issues, and the leader agent will try to restart the failing node.
If a compute node fails, the leader detects the failures and stops sending requests to that node. Moreover it tries to restart the node, forcing a fresh copy of the compute node to be re-deployed via PXE boot.
If you're interested in trying the SmartCloud Provisioning product, you can download a trial version from the following link:
Starting from December 9th 2011 IBM SmartCloud Provisioning 1.2 is available for download.
The key features introduced in this release are:
Full product install through an interactive tool:
IBM SmartCloud Provisioning can now be installed using a graphical
wizard. Two flavours of the installer minimal and custom. The custom
installation allows to specify the number of instances of HBASE and
Zookeeper to be deployed. Moreover it allows to automatically configure
ESXi servers as compute node. The creation of the management virtual
image on VMware is automated.
Support for multiple networks:
you can now deploy images with more than one NIC. Different users can deploy images in segregated networks.
Integration of the Image Construction and Composition Tool:
The Image Construction and Composition Tool
helps building and customizing master images. It is designed to
facilitate a separation of concern and tasks, where experts build software bundles for reuse by others. This design approach greatly simplifies the complexity of virtual image creation and reduces errors.
Support of Open Virualization Format (OVF):
OVF images that can be created or modified by the Image Construction and Composition Tool
OVF metadata can be displayed and modified in the Self Service UI
Integration of the Virtual Image Library component:
The Virtual Image Library helps managing the life cycle of virtual images:
-Search images for specific software products
-Compare two images and determine the differences in files and products
-Find similar images
-Track image versions and provenance
The cloud administrator can use a brand new UI to perform tasks like
registering images, registering networks, managing quotas, assigning
roles, managing elastic IPs
I've recorded a couple demo movies to show the capabilities of the new IBM Virtual Image Library v1.1 that comes with the SmartCloud Provisioning v1.2 product. You can use the links below to go directly to the movies:
We’re getting really good at deploying images. The new SmartCloud Provisioning product makes
image deployment faster and easier then ever.
While the speed and simplicity is cool, left unchecked, image sprawl
issues may catch up with you faster than ever.
Virtual image sprawl is a reasonably new phenomena derived
from the ease of capturing and creating new virtual images. Virtualization and cloud computing make it
very easy to create new virtual images.
As image catalogs grow, finding and locating the right images gets
harder. Existing images quickly become
out of date. Creating a new image is
often easier than figuring out what existing image might be reusable. This all leads to a sprawl of images, and
corresponding management issues.
To control, and proactively prevent image sprawl, we just added
two new capabilities, the Virtual Image Library and the Image Construction and
Composition Tool, into the SmartCloud Provisioning 1.2 beta program. The Virtual Image Library provides a central
view of all your images and instances – across any SmartCloud Provisioning
deployment as well as your existing VMware environments. With Virtual Image Library you can quickly understand
the content of your images, search, and run comparison reports for both
differences and similarities. This will help
you find images to reuse (instead of creating yet another image), and begins to
proactively identify consolidation candidates.
In addition, Virtual Image Library supports a central repository for
your master images, allowing you to perform version control, check-in and check-out
operations across your different environments.
While image library helps you control and manage your
images, the Image Construction and Composition Tool is a proactive step to prevent
image sprawl. With the tool, you
can construct images to share and reuse
across your cloud. The tool makes it
easy to create an image that is reconfigurable during the deployment
process. You can choose to expose
configuration parameters such as user names and ports, and even different
configuration choices. The SmartCloud Provisioning
1.2 instance creation dialog automatically displays these parameters and passes
them through to run your customization scripts.
For example, we use this technique to have one WebSphere Application
Server image that at deploy time is configured as a stand-alone node, or a
custom node, or a deployment manager node, or even an IBM HTTP Server node -
all from the same image. In addition to
building images for Smart Cloud Provisioning 1.2, the tool builds images on the
SmartCloud Enterprise public cloud, and builds image for combination in virtual
system patterns using IBM Workload Deployer.
I hope you’ll take a look at these new beta capabilities and
provide feedback on the SmartCloud Provisioning Open Beta Forum. Let's tame the image sprawl monster.
As part of the transparent development initiative, IBM SmartCloud Provisioning (formerly known as IBM Service Agility Accelerator for Cloud) launches a series of daily demos, starting from November 7th. Every session will take about one hour.
In this way you can have a look in almost real time at what is happening in IBM SmartCloud Provisioning development, learn about new and enhanced capabilities.
If you are interested in joining the sessions, here is the schedule in Central European Time (CET):
- Monday at 4:00 PM
- Tuesday at 11:00 AM
- Wednesday at 4:00 PM
- Thursday at 5:00 PM
- Friday at 11:00 AM
The sessions will be focused on image management.
If you would like to join, using your web browser, connect to
No password is required
Today IBM announced new SmartCloud Foundation capabilities to help organizations realize the potential of cloud computing. Watch the replay of the IBM SmartCloud launch webcast, to learn more about how the new announcements, including IBM SmartCloud Provisioning (delivered by IBM Service Agility Accelerator for Cloud), can help customers move beyond virtualization to more advanced cloud deployments.