This article introduces some modifications to the source code of Walrus, the storage service component included with the Eucalyptus open source framework for cloud computing that implements an IaaS environment (Infrastructure-as-a-Service). Learn how to modify the Walrus source code, and how to recompile and run it, in order to improve the file-sharing and file-locking mechanisms in the Eucalyptus environment.
The best reason we can think of for doing this, for consumers of cloud services or for developers and designers of cloud applications and services, especially if they will employ file sharing or locking, is that it can improve the function of your application or service, which can improve the performance of said resources and which in turn might reduce the overhead allocation of time, bandwidth, and compute power, for your resource. This will likely result in reduced cost.
We show you, step by step, how to install Eucalyptus in a cluster: in this case, an IBM® blade server; this technique can also be used on a personal or laptop computer.
To get the most from this article, you should have a good understanding of cloud computing concepts, Java™ technology and coding UNIX® commands, and some basic understanding on how to work with clusters. To use the sample code, you need a basic understanding of the Eclipse framework. There are links to background information on these technologies in the Resources section.
Installing Eucalyptus in a cluster
For this article, we used Eclipse 3.4.2 and Cent OS 5.4 as the operating system.
IBM blade servers support a wide selection of processor technologies and operating systems to allow clients to run all of their diverse workloads inside a single architecture. The blade servers reduce complexity, improve systems management, and increase energy efficiency while driving down total cost of ownership. We used IBM LS20 BladeCenter® Server (Resources).
We are generally referring to a single-cluster installation in this article; all the components except the node controllers are located on one machine which we refer to as the front end. (In other words, the cloud controller, cluster controller, and storage controller run on the front end machine. The machines running only node controllers are referred to as "nodes."
It's pretty simple to install Eucalyptus 1.6.1 in CentOS. As the admin:
- Extract (untar) the file eucalyptus-1.6.1-centos-i386.tar.gz.
- Login as any user other than
rootand install it as shown in the developerWorks wiki.
After following those steps, Eucalyptus should be installed. Alternative installation directions are available from the Eucalyptus web site. Next, download the Eucalyptus management tools to manage virtual images. The wiki explains bundle image usage.
Figure 1 shows four high-level components, each with its own web service interface, that comprises a Eucalyptus installation:
Figure 1. Eucalyptus four high-level components
Those component are the node controller, cluster controller, storage controller (Walrus), and the cloud controller.
- The node controllers control the execution, inspection, and terminating of VM instances on the host where it runs.
- The cluster controllers gather information about and schedules VM execution on specific node controllers, as well as manages virtual instance network.
- The storage controller (Walrus) is a
put/getstorage service that implements Amazon's S3 interface, providing a mechanism for storing and accessing virtual machine images and user data. - The cloud controller is the entry point into the cloud for users and administrators. It queries node managers for information about resources, makes high level scheduling decisions, and implements them by making requests to cluster controller.
About Walrus, the storage component
Walrus is a storage service included with Eucalyptus that is interface-compatible with Amazon's S3. Walrus lets you store persistent data, organized as buckets and objects.
Walrus does not provide locking for object writes; however, as is the case with S3, you are guaranteed that a consistent copy of the object is saved if there are concurrent writes to the same object. If a write to an object is encountered while there is a previous write to the same object in progress, the previous write is invalidated.
The current version of Walrus offers inconsistent data and no object locking. To run an image in the cloud, you must produce a bundled image and upload it in the cloud. Walrus acts as a storage manager: It receives the image from you and stores it as buckets and objects. When you want to access the image from the cloud, Walrus is entrusted with the task of verifying and decrypting images that have been uploaded by users.
When you want to store an image, a separate bucket is created for each user with a
unique bucket name. Using S3cmd, create a bucket and bucket name:
$ s3cmd mb s3://my-new-bucket-name |
Once a bucket has been created, you can upload the file, referred to as the object, into the bucket:
$ s3cmd put filename s3://my-new-bucket-name/filename |
To learn more about Walrus internal working, you can study Amazon S3's S3cmd (Resources).
Introducing file locking to Walrus
To overcome the drawbacks in Walrus, we have introduced a file-locking mechanism: To maintain data consistency, we've provided the ability to access the file in read/write mode.
When user1 wants to access any file in write mode, the corresponding object will be locked so that other users can't access it until it is released by user1. But other users can access the file in read mode.
We designed a separate queue in which to place the write request of each user using the order in which they requested the object and enabling the system to process the request accordingly.
Before running VM instances in Eucalyptus, you should add the downloaded or created VM images by bundling these images with your Eucalyptus credentials, then upload the image and register them.
To enable a VM image as an executable entity, the Eucalyptus administrator must add a root filesystem image and a kernel/ramdisk pair to Walrus (bucket storage) and register the uploaded data with Eucalyptus. Each image is added to Walrus and registered with Eucalyptus separately, using the following EC2-compatible commands:
- To add the root filesystem image to Walrus:
- Bundle the image:
$ euca-bundle-image -i <vm image file>
- Upload the bundle:
$ euca-upload-bundle -b <image bucket> -m /tmp/<vm image file>.manifest.xml
- Register the image:
$ euca-register <image bucket>/<vm image file>.manifest.xml
- Bundle the image:
- To add the kernel to Walrus and register it with Eucalyptus:
- Bundle the kernel:
$ euca-bundle-image -i <kernel file> --kernel true
- Upload the bundle:
$ euca-upload-bundle -b <kernel bucket> -m /tmp/<kernel file>.manifest.xml
- Register the kernel:
$ euca-register <kernel-bucket>/<kernel file>.manifest.xml
- Bundle the kernel:
At present Eucalyptus doesn't support a file-sharing mechanism, but we'll show you how to implement file sharing in Eucalyptus. We focus on maintaining data consistency.
For each user, a separate Virtual Machine instance is created. In its current incarnation, Eucalyptus also doesn't support sharing files among different VM instances. If two or more users access the file in write mode concurrently and modify the file, the last saved content is updated in the file.
First, lets look at how a volume is created and attached to an instance.
Before creating a new volume, view information about current availability zones:
$ euca-describe-availability-zones |
Create a new volume:
$ euca-create-volume --size <size of volume> -x <name of availability zone> |
where --size denotes the size of volume you wish to create
and -x denotes the name of the availability zone where you want the volume to reside.
Attach a volume to an instance with the following command:
$ euca-attach-volume |
For example, to attach the volume vol-12345678 to the instance i-98765432 at /dev/sdb:
$ euca-attach-volume -i i-98765432 -d /dev/sdb vol-12345678 |
When the VM instance starts running, you can see two IP addresses assigned to it. Login to the IP address using the SSH key:
$ ssh -i mykey.private root@<ip-address> |
Let's look at this in a scenario form
Let's assume that user A and user B logs into two different systems, say System 1 and System 2 with same username and password and try to access a file from both the systems.
Both A and B try to access the same VM instance through Elastic Fox concurrently (at the
same time) in write mode. By using the IP address of the instance, both of them try to the
access the instance using the ssh command. When A modifies the file B does, then B's modification is the one that gets updated. The state of the file writes is not consistent.
The modified walrus architecture helps to make the data file modifications consistent.
Let's look at the architecture of cloud and its virtual network.
Figure 2. The architecture of cloud and its virtual network
The components are:
- The CLC, or cloud controller, which is the interface to the clients and does high-level scheduling; it forms the management platform.
- The ccX are the cluster controllers which schedule incoming requests to specific node controllers and gathers/reports information about a set of node controllers.
- The ncX are the node controllers, the machines which host VM instances.
- Walrus is the persistent secondary storage which is used by the node controllers to store their VM images and sometimes to store data.
Figure 3 shows how a user shares files with other users.
Figure 3. Flow diagram of how a user shares files
In the flow diagram (follow the numbers):
- Client logs in with login ID and password.
- CLC checks the user ID in the database and creates a new session for valid user.
- CLC returns the status message to the client.
- User shares the file he owns.
- CLC now checks whether the user really owns the file or not and upon successful authentication, adds the new users identity to the shared file access list.
- CLC forwards this message to the corresponding CC.
- CC finds the NC hosting the virtual machine instance for the user and forwards this message.
- NC transfers this file to a persistent shared medium (Walrus) to enable sharing between users.
- File is transferred to the Walrus through the CC and CLC.
- File is transferred to the Walrus.
- CLC delivers the success message to the client.
Figure 4 shows how a client requests access to a file.
Figure 4. Flow diagram of how a client requests access to a file
In this flow diagram (follow the numbers):
- User logs in using login and password.
- CLC checks it with the user database and creates a new session for valid user.
- CLC returns the login status message to the client.
- Client requests a file.
- CLC sends request to the user directory to verify user access to the file. The user directory stores file details and user access data.
- CLC forwards the request to the corresponding CC.
- CC finds the NC that hosts the virtual machine instance created for the user.
In steps 8, 9, and 10, the NC transmits data to the user using a secure channel through the CC and CLC.
It's probably time to demonstrate what the inside of a node controller looks like. In every node controller, there is a hypervisor running. The hypervisor is a platform-virtualization software. We are using type 1 hypervisor that interacts directly with the host hardware, runs a guest operating system above the hypervisors, and allocates system resources across LPARs to share physical resources such as CPUs, direct access storage devices, and memory. (Type 1 hypervisors were introduced by IBM in the early 1970s with the IBM System 370 processors.) Figure 5 demonstrates how the use flow works with the NC and its hypervisor.
Figure 5. Inside the node controller
In this flow diagram (follow the numbers):
- Incoming request from the CC to the NC.
- The node controller module running on that node forwards it to the hypervisors.
- The hypervisor does the job with the help of the guest operating system.
- The guest OS instructs the hypervisor what to do.
- The hypervisor now interact with the hardware and completes the job.
We've looked at how file sharing introduced into Eucalyptus can help; now let's look at ensuring data consistency via the concept of accessing files in read/write mode.
Figure 6 demonstrates how a write mode time queue for file access can improve data consistency:
Figure 6. Improving consistency using a write mode time queue
Figure 6 compares user B's request for the file F1 in write mode at time t versus user C's request for the same file in write mode at time t+1. To implement file consistency, we have designed a queue that is used to place the request using first-come-first-served basis.
Since B requested the file first compared with C, B is placed at the top of the queue and C is placed next to B in the queue.
In general, if any request is made by the user for accessing the file in write mode, each request is placed in the order by using the time at which the request is made. The first requested user is placed in the top of the queue. The next requested user is placed adjacent to it, and so on.
Figure 7 demonstrates the added determiner of user-request function (write or read) to determine sharing/locking levels in order to improve data consistency.
Figure 7. Improving consistency using a read/write determiner
In Figure 7, we've added a field in addition to the time frame a user accesses the file to represent in which mode the file access is provided — whether the user is accessing for write or for read.
From the write access queue, user B is at the top of the queue since user B requests first to access the file in write mode compared with user C. User B is provided the write access. User C will get the write access once user B releases the file lock. But user C can access the file in read mode while user B still has it locked in write mode.
In general, if two or more users access the file in write mode concurrently, the first user is granted write mode access to the file and the remaining users write mode access requests are queued. But read mode access is given to all the other users. When the first user with write access releases the file, the top-most user from the queue is then given the write mode access.
Modifying, recompiling, and running the modified code
We know you've waded through all the concepts just to get to this part—the actual modification steps. They're pretty simple.
- Create a workspace and copy the folder clc from the Eucalyptus source.
Figure 8. Choose your workspace folder
- Import the source by clicking File > Import.
Figure 9. Choose your import source
-
Select General > Existing Projects into your workspace.
Figure 10. Select Existing Projects
- Select the root directory path as root/java/workspace/clc.
Figure 11. Select the root directory path
- Click Finish.
Figure 12. When the root directory and projects have been successfully added, click Finish
-
On the left-hand side is a tab called "package" which lists the content of the project. Now right-click build.xml.
Figure 13. Ready to build ...
- Run the Ant build
Figure 14. ... and it's a success!
You should see the build is successful. That was easy.
The application itself has several files, but we cover only the main highlights, leaving it to you to build on this to create you own applications.
To implement file sharing and locking mechanism we have created a class called WalrusVirtualBlockManager. The code implements the file locking mechanism in Eucalyptus. Listing 1 is the source code.
Listing 1. WalrusVirtualBlockManager
package edu.ucsb.eucalyptus.cloud.ws;
import org.apache.log4j.Logger;
import edu.ucsb.eucalyptus.cloud.entities.ObjectInfo;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.locks.*;
import edu.ucsb.eucalyptus.cloud.entities.ObjectInfo;
public class WalrusVirtualBlockManager
{
private static Logger LOG = Logger.getLogger(WalrusVirtualBlockManager.class);
public static Map<ObjectInfo,ReentrantLock>
storagelockMap = new HashMap<ObjectInfo,ReentrantLock>();
private static WalrusVirtualBlockManager virtualBlockMgr;
private WalrusVirtualBlockManager()
{
}
public static WalrusVirtualBlockManager getInstance()
{
VirtualBlockMgr = new WalrusVirtualBlockManager();
return virtualBlockMgr;
}
public ReentrantLock lock(ObjectInfo info)
{
ReentrantLock lck = new ReentrantLock();
storagelockMap.put(info,lck);
return lck;
}
public void unlock(ObjectInfo info)
{
ReentrantLock lck = storagelockMap.get(info);
lck.unlock();
clear(info);
}
public void clear(ObjectInfo info)
{
storagelockMap.remove(info);
}
public void clearAll()
{
for(Map.Entry<ObjectInfo,ReentrantLock> entry : storagelockMap.entrySet())
{
unlock(entry.getKey());
}
storagelockMap.clear();
}
}
|
This modified block storage technique can be adapted for other cloud platforms as well. For example, in Cassandra, the data is replicated. That is, the latest version of a data resource is sitting on some node in the cluster, but older versions are still out there on other nodes. The goal is that eventually, all nodes will access the latest version. File object locking is not available, but the modified block storage technique can be introduced here the way we did in this article to maintain data consistency. You've seen Cassandra in action at Digg, Facebook, Twitter, and other sites.
Now you know how to install Eucalyptus in a cluster and how to modify the Walrus source code to implement or improve the file-sharing and file-locking mechanism on the cloud.
Learn
-
The Eucalyptus community sports a sandbox
environment in which members of the community can test drive and experiment with Eucalyptus. You'll find all the documentation there.
-
Check out some interesting Eucalyptus presentations.
-
"Infrastructure-as-a-Service (IaaS) and Eucalyptus" (developerWorks, December 2009) shows you two things — how IaaS clouds provide basic services you can use to deploy and run your applications and how Eucalyptus can be used as an infrastructure to create public or private clouds. (Another article, "Cloud services for your virtual infrastructure," covers Eucalyptus and Platform-as-a-Service.)
-
Other developerWorks articles on Eucalyptus include:
- "Anatomy of an open source cloud."
- "Cloud computing with Linux."
- "Deploy your database applications and projects on the cloud."
-
Learn more about Amazon S3 tools and commands.
-
Other developerWorks articles on Amazon S3 include:
- "Cultured Perl: Storage management on Amazon S3."
- "Migrate your Linux application to the Amazon cloud."
- "Anatomy of an open source cloud."
-
To learn about basic UNIX commands, check out Commonly used UNIX commands
and UNIX commands.
-
Discover IBM BladeCenter servers. Learn more about the BladeCenter LS20 used in this article.
-
The developerWorks cloud
computing zone offers updated resources on cloud computing, including
- An introduction to the world of cloud computing.
- Updated technical articles and tutorials and pod- and webcasts to ease your development efforts, as well as a window in professional workshops and recorded sessions to make you an efficient cloud developer.
- Connections to IBM product downloads and information designed for use in cloud environments.
- An active feed into the topics the community is buzzing about.
-
The developerWorks open source site offers updated resources on open source software, development, and implementation.
-
The developerWorks Java technology site offers updated resources on Java standards and technology.
-
The developerWorks open source
technology zone hosts technical knowledge of many open source products like Eucalyptus.
-
The IBM Developer Cloud blog gives you
the latest details on the developer cloud from the experts in cloud computing.
-
The how-to wiki is always being updated with common use scenarios for the developer cloud.
-
Stay current with developerWorks technical events and webcasts.
-
The IBMdevcloud channel on YouTube
offers all kinds of hands-on demos such as using a range of IBM products on the test cloud and creating and accessing instances.
-
The ibm.com/cloud portal serves up a
high-level overview of IBM cloud offerings.
Get products and technologies
-
Download the latest build of Eucalyptus.
-
The IBM Smart Business Development and Test on the IBM Cloud is your place to start developing your applications for the cloud.
-
With IBM trial software, available for download directly from developerWorks, build your next development project on the cloud.
Discuss
-
Follow the Eucalyptus chatter on Twitter; you can follow developerWorks too.
-
The Developer Cloud group on My developerWorks is the community for the Smart Business Development and Test on the IBM Cloud.
-
Get involved in the developerWorks community (developer blogs, groups, forums, podcasts, profiles, newsletters, wikis, and community topics) through My developerWorks, a professional network and unified set of community tools for connecting, sharing, and collaborating.
Ramanathan Sundarrajan (MydW profile) is an active member of IBM's Cloud Computing work group and performed lots of research on cloud innovations. Ramanathan monitored final-year student interns from the College of Engineering Guindy at Anna University; one result of that project is this article.
Kishorekumar Neelamegan brings more than 13 years of software development experience with a strong focus on integrating software into the Rational platform. A passionate evangelist on cloud, Kishore is a frequent participant on developerWorks: You can follow his activities through his MydW profile and MydW group, dW India IBMers.




