Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

developerWorks Community:

  • Close [x]

Explore big data analytics and Hadoop

Learn about Hadoop and the big data ecosystem

Date:  13 Nov 2012 |Level: Introductory |

1. Big data

Big data refers to the size of a dataset that has grown too large to be manipulated through traditional methods. These methods include capture, storage, and processing of the data in a tolerable amount of time. Although the term big data was once applied to the concept of data warehouses, it now refers to large-scale processing architectures that focus on capacity, throughput, and genericity of processing.

2. Introducing Hadoop

Hadoop refers to the specific software framework developed under the Apache Project for massively distributed data processing. Its design supports a highly scalable network of thousands of nodes backed by petabytes of data. Hadoop was originally designed using the Java™ language but today has extended itself to many other languages for scripting. Understand the architectures possible with Hadoop and the benefits of their use.

3. Problem-solving with Hadoop

Although Hadoop was inspired by Google's MapReduce usage model, Hadoop is a generic application framework for the processing of massive amounts of data. Learn about the use of Hadoop in artificial intelligence with Apache Mahout, Hadoop with Java technology, and combining Hadoop with the Dojo toolkit for data visualization.

4. Big data and cloud computing

Big data analytics and the cloud are almost a perfect marriage. The ability to elastically provision the number of processing nodes necessary for the analytics job while paying only for their actual use is a prime example of the real benefits the cloud offers. Learn about Hadoop in clouds and optimizing cloud clusters for Hadoop.

5. Hadoop ecosystem technologies

Hadoop isn't a product in itself but rather an ecosystem of software products that together implement fully featured and flexible big data analytics. For example, you can tweak Hadoop through the pluggable job scheduler (for small or large clusters, including multi-user or interactive jobs). Hadoop includes a number of external open source products that enable the Hadoop experience, with examples of HBase, Pig, and Hive. Learn about other Hadoop technologies and the Hadoop software ecosystem.

6. Other big data analytics solutions

Although Hadoop is the prominent open source big data analytics solution, several other solutions provide variations for big data analytics. Examples include Spark, which focuses on in-memory cluster computing, the LexisNexis open source big data analytics solution, and IBM® BigSheets, which helps gather data from structured and unstructured sources to create business intelligence.




Rate this content




Give us feedback

Submission failed. Please try again.

Please complete one of the following questions before submitting.

1. Are you finished with this knowledge path?

       

2. How much did you learn?

           

3. Tell us more

  • What did you like/dislike?
  • What can we do better?

2500 characters left

Disabled Submit button

developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Thank you for your feedback. We appreciate your sharing your opinion with us.

Do you want to save your progress?

, Sign in to save your progress

Save your progress

Sorry. Our server is not available, and we cannot display your saved progress at this time.

Your progress will be displayed when the server is available again. Any previous progress is retained, and additional progress is being tracked.

If your most recent progress is not displayed within 24 hours, you can click the checkmark to indicate completion.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source, Business analytics
ArticleID=845483
publish-date=11132012