Archive

Getting started with Hadoop on the IBM SmartCloud Enterprise

Share this post:

It couldn’t be easier to set up your Hadoop cluster on the IBM SmartCloud Enterprise. It’s also fast! For example, a three-node Hadoop cluster can be up and running in less than 30 minutes.

All the IBM SmartCloud Enterprise data centers include two types of images for BigInsights Basic Edition, which is IBM’s distribution of Hadoop with added features. The basic edition is free:

  • IBM BigInsights Basic 1.1 – Hadoop Master Node
  • IBM BigInsights Basic 1.1 – Hadoop Data node

The images provided in the IBM SmartCloud Enterprise are running under Red Hat Enterprise Linux (RHEL) 5.6, 64-bit with the pay-as-you-go option. There is no charge for BigInsights Basic edition, but there is a charge of US$0.30/hour (at the time of writing) for using RHEL and the IBM SmartCloud Enterprise infrastructure.

If you are new to Hadoop, you can take the free online course Hadoop Fundamentals I at BigDataUniversity.com, which includes videos and lab exercises. This course also includes a video demonstration of the set up, and a video demonstration of running some Hadoop commands on the IBM SmartCloud Enterprise. This material is provided in Lesson 1, section Hands-on lab – Creating your own Hadoop cluster, Option 3 in the course. If you want to take a more detailed course, IBM offers the fee-based InfoSphere BigInsights Essential class. And if you prefer to read step-by-step instructions while trying this out hands-on, I wrote this article in IBM developerWorks; it explains how to provision three instances on the IBM SmartCloud Enterprise to set up a three-node cluster. The article also shows how to verify that your cluster is working by stopping and starting all Hadoop components, testing a few commands, and monitoring your cluster using the BigInsights Web console. You can follow the same instructions in the article to set up a larger cluster that satisfies your needs.

Hadoop uses a master-slave architecture where the master includes a NameNode and a JobTracker node, and the slaves include a DataNode, and a TaskTracker node.

Hadoop can be configured so you work in one of three modes. The stand-alone mode does not start all components and works on a single node. The pseudo-distributed mode starts all components and works on a single node. The fully distributed mode starts all components and requires you to work on more than one node.  The stand-alone and pseudo-distributed modes are typically used in development or testing, while the fully distributed mode is typically used in production scenarios.

When working with the images provided with the IBM SmartCloud Enterprise, you can work in stand-alone or pseudo-distributed mode when provisioning a single node, the Hadoop master node. If you want to work in the fully distributed mode, the IBM SmartCloud Enterprise BigInsights images have been configured so the cluster is easily built simply by specifying the IP address of the Hadoop master node when provisioning Hadoop data nodes. The Hadoop master node instance must be provisioned first.

Thanks to the cloud and Hadoop, it is now possible to handle in a timely manner large amounts of data –structured or unstructured. However, there is a lack of skill in these areas. See the video demonstrations (Hadoop Fundamentals I at BigDataUniversity.com) and the article previously referenced to jump into these technologies with hands-on steps!

If you don’t have an account with IBM SmartCloud Enterprise, you can take advantage of the trial available until November 11, 2011 (sign up by October 28).

Add Comment
No Comments

Leave a Reply

Your email address will not be published.Required fields are marked *

More Archive Stories

Why we added new map tools to Netcool

I had the opportunity to visit a number of telecommunications clients using IBM Netcool over the last year. We frequently discussed the benefits of have a geographically mapped view of topology. Not just because it was nice “eye candy” in the Network Operations Center (NOC), but because it gives an important geographically-based view of network […]

Continue reading

How to streamline continuous delivery through better auditing

IT managers, does this sound familiar? Just when everything is running smoothly, you encounter the release management process in place for upgrading business applications in the production environment. You get an error notification in one of the workflows running the release management process. It can be especially frustrating when the error is coming from the […]

Continue reading

Want to see the latest from WebSphere Liberty? Join our webcast

We just released the latest release of WebSphere Liberty, 16.0.0.4. It includes many new enhancements to its security, database management and overall performance. Interested in what’s new? Join our webcast on January 11, 2017. Why? Read on. I used to take time to reflect on the year behind me as the calendar year closed out, […]

Continue reading