People always ask about failures. That's great that the cloud software can survive failures, but what about my user workloads? Of course the simplest yet all too unsatisfying answer is that your application should be designed to tolerate failures and since the cloud is resilient you can always get more cloud resources. Unfortunately, most people aren't satisfied with this answer. Many enterprise IT folks are used to running expensive servers with very expensive fiber channel attached SAN storage. But what happens with commodity storage exposed over commodity networks and servers?
SCP 1.2 has three kinds of storage 1) gold master images, 2) block storage (volumes), and 3) ephemeral storage. Master images are replicated across a cluster of linux servers. When an instance is created from a master image the guest OS will see a single disk, however, all writes will go to ephemeral storage which is attached to the hypervisor. Although some people do recover the ephemeral storage upon failures, it is designed to be discarded whenever instances are terminated intentionally or otherwise. The master images are replicated for resiliency and scale out performance. For resiliency, we generally establish 2 redundant iSCSI sessions to two separate storage nodes. This can survive network, disk, and storage node failures without affecting the guest workload.
Block storage on the other hand is a bit trickier. We purposely chose not to force redundancy, which turned out to be the cause of amazonocolypse last spring. We had some early customers tell us that they were sufficiently happy to use RAID storage on their storage nodes so they could recover from a failure even though there would be some down time. Of course other users want their storage to be always available. For those users, we've always recommended allocating multiple volumes for each instance. If you create multiple volumes in one call the cloud will attempt to place each volume on a separate physical storage node. Then using guest level software raid like mdadm for linux or the Windows Disk Management tool you can set up disk mirroring to tolerate failures in one of the nodes. Of course you'll need to monitor for failures so you can re-establish redundancy. You can use Smart Cloud Monitoring to detect faults and even trigger automated recovery scripts.
While this is an entirely workable solution that is both scalable and low cost, it is still not enough for some use cases. In particular, this solution will not work for "persistent instances". Of course, you should avoid persistent instances, but sometimes, it's just a heck of a lot easier - you don't have to be smart about configuring your windows or linux guest OS. For this scenario we do have some customers combining SCP 1.2 with GPFS an extremely powerful cluster file system which has been used in some of the world's largest super computer HPC clusters. Using GPFS as the backing store for the SCP storage nodes it is quite simple to automatically failover volumes onto another storage node. In fact, IBM Research has internal prototypes that go even further avoiding any downtime whatsoever as a result of a failed storage node. But I can't tell you about that ;-).
I hope you've found this helpful. I hope you'll agree that there are some pretty good solutions available even if we cannot offer perfection, yet ;-)
Kimic 270002JHJ8 Tags:  smartcloud provisioning delivery devops continuous patterns 2 Comments 7,564 Views
This goes out to all the Operations guys and gals. Have you been tasked with getting your IT organization to be more efficient, more effective..."more with less?" At the same time, your development teams are expected to delivery new applications at warp speed while you have specific service level agreements to meet governing the stability of your production environments. Speed.... stability.... seems diametrically opposed? If you haven’t heard of DevOps yet--the methodology of bringing development and operation teams together to collaborate, integrate and deliver more robust applications to the marketplace more efficiently and more effectively—its a cool new way of thinking and doing for all teams involved.
IBM has jumped into the deep-end of DevOps with the recent announcement of the SmartCloud Continuous Delivery beta. This solution will allow the integration of new and existing tools to automate and enhance the delivery pipeline of applications end-to-end. This post will hopefully give you some ideas on how you might be able to utilize DevOps to bring tangible changes to your IT organization..
First off, is your organization using cloud computing effectively today? Ops teams may already be utilizing some form of virtualization to increase efficiency and effectiveness. Aligning DevOps methodology, cloud can automate and reduce routine daily tasks and free up resources to focus on different innovation. Have a closer look how SmartCloud Continuous Delivery, in conjunction with IBM SmartCloud Provisioning, can help mobilize teams to move to DevOps.
Fact or Fiction?
I won't have to provision environments for development teams any more!
Fact - Ops can define the system patterns for developers to self-provision so they are no longer dependent on the Ops team. There will likely be times when Ops teams do want to provision environments that are needed but it doesn't have to be as often.
I will never be able to monitor all the virtual systems to validate they meet the security requirements of my company
Fiction - Patterns can be built based on the compliant virtual images that Ops maintains and tracks. Development can then self-provision these pre-defined patterns. Ops an update existing patterns and upgrade deployed VMs as required.
I can define network isolation and resource constraints to ensure the integrity of my cloud for my customers
Fact -The automated deployment scripts define the access level of authorized users and groups--these stored artifacts preserve the authorization specific users and groups are given allowing controlled multi-tenancy in a cloud.
The ability of developers to be able to standup their own environments is helpful but the consequence will be tons of stagnant VMs hanging around
Fiction - Build artifacts can be stored in the asset manager which tracks each state and age of the provisioned VMs. Policies are used to ensure VMs are maintained only as long as appropriate for a particular deployment (for example, personal deployment vs long test run deployment)
I hope this taste of Fact or Fiction gives you a sense of how DevOps can transform collaboration and effectiveness for both Development and Operations teams. The Enterprise DevOps Blog here will keep you up to date and provide additional information around DevOps. You can also test drive highly-scalable, low-touch cloud with a SmartCloud Provisioning no-charge trial.
There will be a live session held Tuesday June 12 2012 that will provide an overview of the data protection capabilities that IBM Tivoli Storage Manager for Virtual Environments brings to IBM SmartCloud Provisioning.
Come to learn what we have to offer and tell us about your data protection strategies in the cloud and what use cases you have and see value in.
This will be an opportunity to share and provide valuable feedback to the product teams that will shape future capabilities.
Date and Time:Tuesday June 12 2012
US: 6am PDT, 9am EDT
Europe: 2pm BST, 3pm CEST
Follow the below link for details and enrollment.
Scroll down to 'Live Sessions'.
Look for Session Title: Feedback session on automated data protection in the cloud
DRussell4881 12000070EV 3,686 Views
An interesting insight by Dr. Angel Diaz into the Practical Guide to Service Level Agreements ( SLA ), published by the Cloud Standards Customer Council ( CSCC ).
Who is responsible for the management of the services that will operate in a cloud environment? Who is responsible for identifying the elements of the agreement? What type of agreement should be in place? These are all questions that should be asked and understood before moving a service to the cloud.
To read the complete article, go to http://thoughtsoncloud.com/index.php/2012/05/cloud-service-level-agreements-slas-what-you-dont-know-can-hurt-you/ .
A new beta drop for IBM SmartCloud Provisioning is available.
Backing up IBM SmartCloud Provisioning's Persistent Volumes and Infrastructure with Tivoli Storage Manager Client
PQC6_jim_Markham 120000PQC6 Tags:  cloud solutions smartcloud_resilience vmware backup esx smartcloud kvm tsm integration provisioning 9,563 Views
Two new white papers are available on the IBM Integrated Service Management Library ( ISML ) that explain how to use Tivoli Storage Manager to back up different areas within IBM SmartCloud Provisioning.
The first white paper provides information on how to use Tivoli Storage Manager Backup-Archive client to back up and restore the boot volume of an IBM SmartCloud Provisioning persistent virtual machine and how to make periodic back ups of a normal volume, and select and restore a particular backup.
This white paper can be downloaded from the IBM Integrated Service Management Library( ISML ) following this link -> Backing up IBM SmartCloud Provisioning's Persistent Volumes with Tivoli Storage Manager Client
The second white paper provides information on how to use Tivoli Storage Manager Backup-Archive client to back up and restore the following components of the IBM SmartCloud Provisioning infrastructure: the Preboot Execution Environment ( PXE ) server, the web console configuration, and the HBase data store.
This white paper can be downloaded from the IBM Integrated Service Management Library( ISML ) following this link -> Backing up IBM SmartCloud Provisioning's Infrastructure with Tivoli Storage Manager Client
SandraWeiss 060000BCJJ Tags:  kvm vmware virtualization monitoring esxi health smartcloud_health solutions provisioning smartcloud cloud 1 Comment 8,810 Views
Service Health for IBM SmartCloud Provisioning has officially GA'ed and is now available on IBM Integrated Service Management Library ( ISML ).
Service Health provides pre-built integrations between IBM SmartCloud Provisioning and IBM SmartCloud Monitoring utilizing a custom agent, OS agents, and the ITMfVE agents. A product provided navigator offers a concise overview on the health of the IBM SmartCloud Provisioning infrastructure enabling the ability to identify and react to issues in your environment quickly minimizing the impact, such as an unresponsive compute node, high disk usage on storage nodes or key kernel services not responding. It also provides visibility into the KVM and ESXi hyper-visors.
This solution can be downloaded from the IBM Integrated Service Management Library( ISML ) following this link -> Service Health for IBM SmartCloud Provisioning