We are happy to announce that a Beta program
is underway for the Tivoli Virtual Deployment Engine (VDE
), also known as High Scale Low Touch (HSLT
will provide a low-cost, easy to implement cloud solution to fulfil demand for rapid deployment of virtual machines and storage in high or low volume environments with massive scalability, high resiliency and low management costs. It's been in development for a number of months and is now ready for wider trials with our customers. Those who have witnessed it first hand are highly impressed with the speed of deployment and the fast time-to-value.
The main features of this exciting new capability are:
- Quick deployment times of VM and storage
- even for thousands of virtual images
- Hours/days to install instead of weeks/months
- Ability to deploy a running VM in less than 30 seconds
- Ability to handle high volumes of VM requests -- over 4,000 per hour
- Update/Upgrades with zero downtime
- Low overhead
approaches to image instantiation, recovery and management
- Self-service feature
to give end users easy and fast access to the virtual components they require
- Distributed design avoids single point of failure
. Recovery from failures is automatic and immediate, not manual and time-consuming
- Reduced administration
and maintenance effortYou can now download and deploy the Beta code in your own test environment, and see for yourself.
Design presentations, demonstrations and documentation are all available on our Discussion Forum
, which all participants will be given access to. As well as giving you direct contact with the Development team, the Forum also gives you the opportunity to connect with our other participating customers and business partners.
Beta Programs give you an unrivalled opportunity to influence the direction of products so that they suit the way you do business, and to try new or updated solutions in your own environment free of charge.
If you would like to join the Beta program or simply want more information, please contact the Beta Manager, Valory Batchellor (firstname.lastname@example.org)
Today IBM announced new SmartCloud Foundation capabilities to help organizations realize the potential of cloud computing. Watch the replay of the IBM SmartCloud launch webcast, to learn more about how the new announcements, including IBM SmartCloud Provisioning (delivered by IBM Service Agility Accelerator for Cloud), can help customers move beyond virtualization to more advanced cloud deployments.
IBM SmartCloud Provisioning (previously known as IBM Service Agility Accelerator for Cloud) fully embraces the transparent development philosophy.
Starting from today, you can join our open beta program. This Program is intended to raise awareness of IBM SmartCloud Provisioning with the widest possible
audience and provide a feedback mechanism to let you tell us what you like about the product, and what we could improve.
The code is downloadable from https://www14.software.ibm.com/iwm/web/cc/earlyprograms/tivoli/P2044/index.shtml
Due to the open nature of this beta program, the code is time bombed, you can use it until december 31st 2011.
You can discuss issues related to the code drop into this forum: http://www.ibm.com/developerworks/forums/forum.jspa?forumID=2673
There's still time to sign up for the IBM webcast: Managing the Cloud – Best practices for cloud service management
Organizations today are looking to cloud computing to deliver cost savings and faster service delivery. However, most organizations are still struggling to have the basic IT infrastructure that is necessary to take the leap to a robust cloud. This session will explain how service management can help provide the essentials to maintain service levels in the cloud and best practices based on IBM's work with customers. This information will provide the foundation for building and managing a cloud to meet your business objectives and transform IT.
As part of the transparent development initiative, IBM SmartCloud Provisioning (formerly known as IBM Service Agility Accelerator for Cloud) launches a series of daily demos, starting from November 7th. Every session will take about one hour.
In this way you can have a look in almost real time at what is happening in IBM SmartCloud Provisioning development, learn about new and enhanced capabilities.
If you are interested in joining the sessions, here is the schedule in Central European Time (CET):
- Monday at 4:00 PM
- Tuesday at 11:00 AM
- Wednesday at 4:00 PM
- Thursday at 5:00 PM
- Friday at 11:00 AM
The sessions will be focused on image management.
If you would like to join, using your web browser, connect to
No password is required
We’re getting really good at deploying images. The new SmartCloud Provisioning product makes
image deployment faster and easier then ever.
While the speed and simplicity is cool, left unchecked, image sprawl
issues may catch up with you faster than ever.
Virtual image sprawl is a reasonably new phenomena derived
from the ease of capturing and creating new virtual images. Virtualization and cloud computing make it
very easy to create new virtual images.
As image catalogs grow, finding and locating the right images gets
harder. Existing images quickly become
out of date. Creating a new image is
often easier than figuring out what existing image might be reusable. This all leads to a sprawl of images, and
corresponding management issues.
To control, and proactively prevent image sprawl, we just added
two new capabilities, the Virtual Image Library and the Image Construction and
Composition Tool, into the SmartCloud Provisioning 1.2 beta program. The Virtual Image Library provides a central
view of all your images and instances – across any SmartCloud Provisioning
deployment as well as your existing VMware environments. With Virtual Image Library you can quickly understand
the content of your images, search, and run comparison reports for both
differences and similarities. This will help
you find images to reuse (instead of creating yet another image), and begins to
proactively identify consolidation candidates.
In addition, Virtual Image Library supports a central repository for
your master images, allowing you to perform version control, check-in and check-out
operations across your different environments.
While image library helps you control and manage your
images, the Image Construction and Composition Tool is a proactive step to prevent
image sprawl. With the tool, you
can construct images to share and reuse
across your cloud. The tool makes it
easy to create an image that is reconfigurable during the deployment
process. You can choose to expose
configuration parameters such as user names and ports, and even different
configuration choices. The SmartCloud Provisioning
1.2 instance creation dialog automatically displays these parameters and passes
them through to run your customization scripts.
For example, we use this technique to have one WebSphere Application
Server image that at deploy time is configured as a stand-alone node, or a
custom node, or a deployment manager node, or even an IBM HTTP Server node -
all from the same image. In addition to
building images for Smart Cloud Provisioning 1.2, the tool builds images on the
SmartCloud Enterprise public cloud, and builds image for combination in virtual
system patterns using IBM Workload Deployer.
I hope you’ll take a look at these new beta capabilities and
provide feedback on the SmartCloud Provisioning Open Beta Forum. Let's tame the image sprawl monster.
With the barrage of cloud news constantly hitting the market, it can be challenging for organizations to differentiate between all of the solutions and capabilities out there.
But with the latest cloud offering from IBM, the value proposition is quite simple—you get a low-cost, low-risk entry to cloud computing with compelling features. This is especially important for organizations who are still trying to leverage the cost savings of virtualization.
Our customers have told us they’re looking to cloud computing to increase agility—the ability of IT to evolve and meet business needs—and they’re looking for ways to control expenses related to IT investments. They also want to reduce IT complexity while at the same time increase utilization, reliability and scalability of IT resources. And they are looking for the ability to expand capabilities gradually, as their needs change and grow.
In designing a solution to meet all of these needs, we developed IBM SmartCloud Provisioning. Using industry best practices for cloud deployment and management, this new solution allows organizations to quickly deploy cloud resources with automated provisioning, parallel scalability and integrated fault tolerance to increase operational efficiency and respond to user needs.
The name doesn’t tell the whole story though. IBM SmartCloud Provisioning is a full-featured solution wrapped up in an easy-to-implement package. That means you get:
- Rapidly scalable deployment designed to meet business growth
- Reliable, non-stop cloud capable of automatically tolerating and recovering from software and hardware failures
- Reduced complexity through ease of use and improve time to value
- Reduced IT labor resources with self-service requesting and highly automated operations
- Control over image sprawl and reduced business risk through rich analytics, image versioning and federated image library features
Using this technology, we’ve seen customers get a cloud up and running in just hours—realizing immediate time to value. It’s fast—administrators have been able to go from bare metal to ready-for-work in under five minutes, or start a single VM and load OS in under 10 seconds, or scale up to 50,000 VMs in an hour (50 nodes).
But ultimately, these IT benefits have translated to business benefits—customers have been able to see how cloud computing can impact their business, and how they can accelerate the delivery of new services to drive revenue.
With the new release of IBM SmartCloud Provisioning this week, you can try and see firsthand the potential of this breakthrough technology to accelerate your journey to cloud. And if you want a preview of what’s in development, you can join our Open Beta program for access to beta-level code.
Modern Cloud infrastructures are built leveraging thousands of highly distributed servers, used to provide services directly to customers over the Internet. The service provider has two extremely important objectives, which, unfortunately, are to some degree contrasting: a) ensure continuous availability of the Cloud service, and b) contain the cost of the infrastructure and administration (CAPEX and OPEX).
There are several factors that have an impact on the availability of services, mostly related to infrastructure failures. Failures are not only related to unrecoverable hardware outages, but also to recoverable OS or middleware failures.
Not so long ago, the most common approach to high availability was to assume one could deploy infrastructures with the highest Mean Time To Failure (MTTF) possible, which required expensive systems and assumed the possibility to write error-safe software applications. It was also assumed that some degree of down-time was acceptable, with vendors boasting of the number of 9's that they could support (e.g. 99.999% availability). In today's always-on Internet, any downtime of major services becomes headline news. The traditional approach is no longer applicable, and a new approach has to be considered.
Given the requirement to reduce infrastructure costs, service providers are using commodity hardware. Given also the requirement to reduce operational costs, hardware failures are commonly dealt with by directly replacing the failed component rather than manual debugging and recovery by skilled (and expensive) administrators. Thus, to maintain the objective of continuous availability of the service, the Cloud system must be built in order to expect failure of the underlying infrastructure, and not only for temporary periods but it must assume that components will disappear forever. This cannot be limited to only hardware components, as no matter how well a software element is tested, unexpected edge conditions will appear at some point-in-time. So, to guarantee continuous availability, a Cloud solution must also expect its own components to fail too.
Given that we are forced to expect failure, the high MTTF approach is no longer valid, and instead we have to increase availability by flipping the approach to minimizing Mean Time To Recovery (MTTR). The quicker the system can recover from failure, the higher the availability of the service will be. Given however that even a tiny percentage of downtime is no longer acceptable, we also need a means to maintain service availability during the recovery process. One way of doing this is through providing redundancy of all critical services within the Cloud solution.
SmartCloud Provisioning is designed according to the ROC principles, because it is based on a highly distributed, redundant and robust infrastructure, with near zero downtime, and automated recovery across heterogeneous platforms, and it does not require expensive systems, but can run on a relatively low-cost commodity infrastructure.
The key factors that allow SmartCloud Provisioning to be a low-touch and robust cloud infrastructure are the following:
the infrastructure is as stateless as possible: this avoids issues related to single points of failure
management agents are deployed on the physical nodes of the infrastructure (compute nodes and storage nodes) and are connected in a peer-to-peer network to form a self-monitoring and self-managing infrastructure
core services are redundant being deployed in clusters to tolerate individual faults
master images are replicated in multiple copies across the storage nodes in the storage cluster; this tolerates HW failures of the storage nodes in the cluster as well as network failures when accessing one copy of the image
hypervisor (compute) nodes are deployed via a stateless boot so that it becomes easier to re-deploy a failing hypervisor by simply rebooting it and getting a fresh new copy of the hypervisor image. This also allows easy deployment of new nodes if needed, to augment the capacity of the infrastructure
Let's consider some typical failure scenarios that can happen in a real environment and let's see how the SmartCloud Provisioning is designed to tolerate them and react appropriately.
First example is related to the management agents that are used by SmartCloud Provisioning to perform the standard provisioning operations.
Management agents are deployed on both the compute nodes and the storage nodes and are organized in dynamic hyerarchies, where a leader (manager) is dynamically elected. The leader is just the entry point for distributing the requests across the infrastructure and a coordinator of any operation, but this role does not imply any special information being associated with the agent itself (stateless infrastructure): any agent can be a leader.
All the agents have a watch-dog mechanism that is used to prevent, detect and correct failures; they also monitor each other in the neighborhood and can start simple actions to fix other agents issues.
So, if an agent fails, the watch-dog mechanism tries to restart it. If the watch-dog is not able to restart the agent, neighbours try some simple actions to restart the failing agent. If the agent cannot be restarted, the system keeps on working without that node, thanks to the redundant infrastructure.
If the failing agent was a leader, and it cannot be restarted, the managed agents can re-elect their leader dynamically, without losing any information.
Another example is related to failures either in a storage node or in a compute node.
If a storage node fails, thanks to the redundant deployment and to the multiple copies of the same image available in the storage cluster, the deployment of VMs can continue without issues, and the leader agent will try to restart the failing node.
If a compute node fails, the leader detects the failures and stops sending requests to that node. Moreover it tries to restart the node, forcing a fresh copy of the compute node to be re-deployed via PXE boot.
If you're interested in trying the SmartCloud Provisioning product, you can download a trial version from the following link:
If you would like to try out IBM SmartCloud Provisioning 1.2 core functionalities but you are worried you do not have time to spend installing it or you do not have enough hardware, you can download a special demo package from Integrated Service Management Library
It gets installed in a single physical box.
The system must use x86_64 processors that support virtualization.
In addition to that you need at least 3 GB memory and 30 GB disk space
The required operating system for this installation is Linux CentOS 6.0 64 bits.
In addition to that the following packages are required:
Then the installer configures the physical box as compute node, storage node, pxe server and dhcp server, then it creates a virtual image (the hypervisor is KVM) that acts as second storage node, webconsole, web-adminconsole, webservice, rest server, hbase and zookeeper.
Further installation details are available in the readme downloadable with the package.
SmartCloud Provisioning is designed to minimize the use of a centralized “command and control” approach, in favor of scale out management, where endpoints can participate in management activities and do not depend on a single configuration management database.
This allows SmartCloud Provisioning to handle multiple provisioning tasks in parallel, across an unlimited number of servers.
Cloud users can request deployments of virtual machines and have access to the provisioned systems in very few seconds, thanks to the parallel and distributed processing that happens transparently and under the covers.
Let’s drill down into the details about this distributed management approach.
SmartCloud Provisioning internally uses a peer to peer (P2P) messaging infrastructure to pass provisioning and management messages between agents, which contribute to the decentralized control.
Agents are installed on the compute nodes (i.e. the hypervisors) as well as on the storage nodes, where images and volumes reside.
The P2P connections between agents not only allow self-monitoring of their health in order to implement a low-touch management infrastructure, but also allow orchestrating the communications to achieve an effective load distribution and decentralized management of the requests performed by cloud users.
The P2P communication overlay is backed by a distributed lock service, which is based on ZooKeeper.
ZooKeeper is a distributed, open-source coordination service for distributed applications, which exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration maintenance, and groups and naming. It is designed to be easy to program, and uses a data model styled after the familiar directory tree structure of file systems.
Like the distributed processes it coordinates, ZooKeeper itself is intended to be replicated over a set of servers that must all know about each other. They maintain an in-memory image of state, along with a transaction logs and snapshots in a persistent store.
SmartCloud Provisioning agents connect to a single ZooKeeper server. Each agent maintains a TCP connection with the Zookeeper server, through which it sends requests, gets responses, gets watch events, and sends heart beats. If the TCP connection to the server breaks, the agent will connect to a different server.
When a deployment request is received by SmartCloud Provisioning, the request is processed by the Web Services layer, passed to the management infrastructure, and managed by the agents and the ZooKeeper services.
The following steps describe in more details the internal communications, as depicted in figure 1 below.
This processing happens in a transparent way for the end user, who just sees the deployment request being served in few seconds.
- The Web Services layer takes a deployment request in charge (e.g. deploy 50 “Large” instances of image “LOB123-RHEL 6.0”), and triggers a first interaction with the ZooKeeper server to ask which agent in the compute nodes layer can take this request into account.
- The ZooKeeper server selects one of the available leaders in the compute nodes layer and returns this information back to the web service layer. The role of the selected leader will be to initiate an internal hand-shaking among the compute nodes agents to process the incoming request.
- The Web Service layer receives the information about which agent to contact, and opens a connection to that agent, passing the deployment request details.
- The selected agent takes care of the request and starts a “discussion” phase with all the other leaders (one for each rack) in order to distribute the load of the incoming request among all the agents that could provide resources to fulfill it. This happens leveraging the P2P connection between agents.
- Inside each rack, the leader triggers a parallel P2P interaction with all the agents on all the compute nodes included in that rack, to understand which agent can serve a portion of the incoming request. Each agent having enough free resources to serve “Large” instances answers the request coming from its leader, so that, at the end of the hand-shaking process each leader knows which portion of the incoming request can be processed by which agent.
- At this point, each involved agent knows which part of the incoming request it is supposed to process. To start the real deployment step, the agent asks the ZooKeeper server where to find the “LOB123-RHEL 6.0” image to be deployed, according to the incoming request. The ZooKeeper again answers the incoming requests by providing one of the available agent leaders on the storage nodes layer.
- When an agent receives back the information about which storage node to connect to, it opens a P2P connection with the related agent and asks for the image it needs to fulfill the deployment request.
- The storage node agent leader starts in turn a P2P communication with the other leaders asking for the selected image. Each leader inside its managed rack triggers other P2P connections to ask each managed agent if it has a copy of the requested image.
- The storage leader initiating the request collects back all the details about agents having a copy of the requested image and selects at least two of them (default redundancy required by SmartCloud Provisioning), returning the information to the calling compute-node agent. The compute-node agent at this point can access the image and starts the deployment of VMs, according to its capacity and to the amount of work it offered to serve.
As I said, this processing happens under the covers in a very fast way and the user does not have to worry about any of the steps above.
This allows reaching high levels of parallelism, decentralized management, as well as scale-out capabilities that can be easily reached by increasing the number of servers.
If you're interested in trying the SmartCloud Provisioning distributed management capabilities, you can download a trial version from the following link:
There is a brand new demo for IBM SmartCloud Provisioning 1.2.
It is launchpad based hence allowing you to dive into various capabilities individually with a short and quick overview.
It covers the main IBM SmartCloud Provisioning capabilities:
- fast virtual image provisioning
- easily extending your cloud infrastructure
- fault tolerance
- low touch
- image management:
- controlling image sprawl
- drift analysis
- image search
We're pleased to make available as a beta Service Health for IBM SmartCloud Provisioning. As this is a beta we welcome any and all feedback.
Service Health (Beta) for IBM® SmartCloud Provisioning provides prebuilt integrations between IBM SmartCloud Provisioning and IBM SmartCloud Monitoring. This solution allows you to easily monitor your IBM SmartCloud Provisioning infrastructure to identify and react to issues in your environment.
This solution is available via the IBM Integrated Service Management Library( ISML ). You can find it here -> Service Health for IBM SmartCloud Provisioning
. Please use the "Comment or Review" link on that page to post feedback. You may also use the "Contact Provider" link as well.
SmartCloud Provisioning released its stand-alone, limited-fcunction version here:
If you only has 1 HW, you can play with it.
Nice to have it in hand to understand the product and learn how to use it.
The open beta program for the upcoming IBM SmartCloud Provisioning release started:
- Freely download the code, run it unattended in your premises without the need to sign a non-disclosure agreement
- Discuss what you think about that on a dedicated forum
- Watch demonstrations of IBM SmartCloud Provisioning capabilities in the work and tell us if you like or do not like the newest features just clicking a button.
- Join our community to get early access to and provide feedback on cloud provisioning and orchestration technologies
- Stay tuned to the community to hear the latest new on available code drops and functionalities
- Play with the product in our premises joining the hosted beta. To access the hosted beta, send an email to email@example.com
One of the messages behind cloud computing is "pay-per-use": the adoption of a virtualized, standardized, self service and automated environment should come with the possibility to be charged only for the used resources.
IBM SmartCloud Provisoning comes with an idea of low complexity, low administration and ease of use.
Keeping these messages in mind, I was thinking at how to extract metering information. I had in mind something easy, doable also by people who definitely do not want to invest in programming, that does not need any modification to database tables to store historical data.
So I had a look at the available IBM SmartCloud Provisioning interfaces and I just found a couple of command line commands that could help me achieving my goal:iaas-describe-resources-inuse-by-access
The first command displays the number of images, cores and the amount of memory and disk space in use by a specific access ID. So this commands shows the key measures that in cloud computing are usually taken into consideration for usage and accounting.
The second command shows the relationship between access IDs and user IDs. This mapping helps in building metering information per user and not per access ID. In a simple environment the map is 1-1, but for example you may have the same user accessing more VM regions and so having multiple access IDs associated.
Given these two command, it is pretty straightforward to setup a couple of cronjobs/periodic tasks (depending if you would like to do it on Linux or on Windows) that with a predefined sequence ( for example once a hour) will extract this data and store it in a temporary file.
You can then have a another cronjob/periodic task that sums up all these information daily, per user, maybe adding your specific rate codes. If you choose to store this data for example in a CSV file, then you can easily import it into a reporting engine.