Archive

The perfect marriage: Hadoop and Cloud

Share this post:

Hadoop meet Cloud, Cloud meet Hadoop. It was love at first sight. Hadoop found in Cloud the wind beneath its wings, a loyal companion; available any time she needed him, flexible, and elastic. Cloud found in Hadoop its partner in life to share and discover new things; together anything was possible. Structured, semi-structured or unstructured data could not stop Hadoop, and Cloud was always there for her.

If you are new to Hadoop and cloud computing, you probably have no idea what the first paragraph was about. Let me step back, and tell you quickly the story of cloud computing, and then of Hadoop. Then, you will understand why they have a perfect marriage.

Cloud computing is a new model for delivering computing resources. It gives the illusion of having an infinite amount of resources available on demand. Users do not have to commit to a given software or hardware. They can simply rent the resources they need, and even better, only pay for what they use for the amount of time they use it. Cloud computing enables users, through automation, to self-provision the resources they need. This automation, the availability of pre-defined resources, and economics of scale is what brings costs down allowing any individual to start working on a server paying as little as 10 cents per hour. Cloud computing usage has grown exponentially. According to this survey, most companies anticipate using the cloud for most or all of its needs by 2015.

Even with all this power and availability (from a hardware point of view), there are major hurdles (from a software perspective) to manipulate vast amounts of data, big data, that is being collected daily. Volume, velocity, and variety (V3) are terms often used today to describe the characteristics of big data. The world as IBM describes it with its Smarter Planet initiative is now instrumented, interconnected, and intelligent (I3). This means that more and more data is being collected (volume), and is being collected at incredible speeds (velocity) from different sources such as sensors, Twitter, Facebook, and so on. It is said that about 80% of the data collected is unstructured (variety). This means it would be hard to manipulate this data with existing relational database software, which mainly manages structured data. This is why Hadoop was born.

Hadoop’s inception started with papers published by Google that described how Google was manipulating the data they were collecting with their services. This information was used by an engineer from Yahoo to develop Hadoop, an open source Java framework. Hadoop consists mainly of two components: A new file system (HDFS), and a new way to code programs (MapReduce). Using commodity hardware, HDFS replicates blocks of data across many nodes in the cluster, and this provides data reliability. With MapReduce, the code is sent to the nodes in the cluster closest to where the data to be manipulated resides, and then it processes the blocks of data in parallel. The result is fast and reliable output.  Hadoop works well for large files accessed sequentially. It is very fast, but it works like a batch job, so the results are not immediate.

In summary, Hadoop and Cloud working together deliver value. Without Hadoop, users cannot see all the benefits they can get from the cloud. Without the cloud, users cannot see all the benefits they can get from Hadoop.

Hadoop and Cloud are a perfect marriage; we hope to see kids coming soon!

More stories

Why we added new map tools to Netcool

I had the opportunity to visit a number of telecommunications clients using IBM Netcool over the last year. We frequently discussed the benefits of have a geographically mapped view of topology. Not just because it was nice “eye candy” in the Network Operations Center (NOC), but because it gives an important geographically-based view of network […]

Continue reading

How to streamline continuous delivery through better auditing

IT managers, does this sound familiar? Just when everything is running smoothly, you encounter the release management process in place for upgrading business applications in the production environment. You get an error notification in one of the workflows running the release management process. It can be especially frustrating when the error is coming from the […]

Continue reading

Want to see the latest from WebSphere Liberty? Join our webcast

We just released the latest release of WebSphere Liberty, 16.0.0.4. It includes many new enhancements to its security, database management and overall performance. Interested in what’s new? Join our webcast on January 11, 2017. Why? Read on. I used to take time to reflect on the year behind me as the calendar year closed out, […]

Continue reading