October 25, 2011 | Written by: Raul Chong
Share this post:
It couldn’t be easier to set up your Hadoop cluster on the IBM SmartCloud Enterprise. It’s also fast! For example, a three-node Hadoop cluster can be up and running in less than 30 minutes.
All the IBM SmartCloud Enterprise data centers include two types of images for BigInsights Basic Edition, which is IBM’s distribution of Hadoop with added features. The basic edition is free:
- IBM BigInsights Basic 1.1 – Hadoop Master Node
- IBM BigInsights Basic 1.1 – Hadoop Data node
The images provided in the IBM SmartCloud Enterprise are running under Red Hat Enterprise Linux (RHEL) 5.6, 64-bit with the pay-as-you-go option. There is no charge for BigInsights Basic edition, but there is a charge of US$0.30/hour (at the time of writing) for using RHEL and the IBM SmartCloud Enterprise infrastructure.
If you are new to Hadoop, you can take the free online course Hadoop Fundamentals I at BigDataUniversity.com, which includes videos and lab exercises. This course also includes a video demonstration of the set up, and a video demonstration of running some Hadoop commands on the IBM SmartCloud Enterprise. This material is provided in Lesson 1, section Hands-on lab – Creating your own Hadoop cluster, Option 3 in the course. If you want to take a more detailed course, IBM offers the fee-based InfoSphere BigInsights Essential class. And if you prefer to read step-by-step instructions while trying this out hands-on, I wrote this article in IBM developerWorks; it explains how to provision three instances on the IBM SmartCloud Enterprise to set up a three-node cluster. The article also shows how to verify that your cluster is working by stopping and starting all Hadoop components, testing a few commands, and monitoring your cluster using the BigInsights Web console. You can follow the same instructions in the article to set up a larger cluster that satisfies your needs.
Hadoop uses a master-slave architecture where the master includes a NameNode and a JobTracker node, and the slaves include a DataNode, and a TaskTracker node.
Hadoop can be configured so you work in one of three modes. The stand-alone mode does not start all components and works on a single node. The pseudo-distributed mode starts all components and works on a single node. The fully distributed mode starts all components and requires you to work on more than one node. The stand-alone and pseudo-distributed modes are typically used in development or testing, while the fully distributed mode is typically used in production scenarios.
When working with the images provided with the IBM SmartCloud Enterprise, you can work in stand-alone or pseudo-distributed mode when provisioning a single node, the Hadoop master node. If you want to work in the fully distributed mode, the IBM SmartCloud Enterprise BigInsights images have been configured so the cluster is easily built simply by specifying the IP address of the Hadoop master node when provisioning Hadoop data nodes. The Hadoop master node instance must be provisioned first.
Thanks to the cloud and Hadoop, it is now possible to handle in a timely manner large amounts of data –structured or unstructured. However, there is a lack of skill in these areas. See the video demonstrations (Hadoop Fundamentals I at BigDataUniversity.com) and the article previously referenced to jump into these technologies with hands-on steps!
If you don’t have an account with IBM SmartCloud Enterprise, you can take advantage of the trial available until November 11, 2011 (sign up by October 28).