Big data refers to the size of a dataset that has grown too large to be manipulated through traditional methods. These methods include capture, storage, and processing of the data in a tolerable amount of time. Although the term big data was once applied to the concept of data warehouses, it now refers to large-scale processing architectures that focus on capacity, throughput, and genericity of processing.
Hadoop refers to the specific software framework developed under the Apache Project for massively distributed data processing. Its design supports a highly scalable network of thousands of nodes backed by petabytes of data. Hadoop was originally designed using the Java™ language but today has extended itself to many other languages for scripting. Understand the architectures possible with Hadoop and the benefits of their use.
Although Hadoop was inspired by Google's MapReduce usage model, Hadoop is a generic application framework for the processing of massive amounts of data. Learn about the use of Hadoop in artificial intelligence with Apache Mahout, Hadoop with Java technology, and combining Hadoop with the Dojo toolkit for data visualization.
Big data analytics and the cloud are almost a perfect marriage. The ability to elastically provision the number of processing nodes necessary for the analytics job while paying only for their actual use is a prime example of the real benefits the cloud offers. Learn about Hadoop in clouds and optimizing cloud clusters for Hadoop.
Hadoop isn't a product in itself but rather an ecosystem of software products that together implement fully featured and flexible big data analytics. For example, you can tweak Hadoop through the pluggable job scheduler (for small or large clusters, including multi-user or interactive jobs). Hadoop includes a number of external open source products that enable the Hadoop experience, with examples of HBase, Pig, and Hive. Learn about other Hadoop technologies and the Hadoop software ecosystem.
Although Hadoop is the prominent open source big data analytics solution, several other solutions provide variations for big data analytics. Examples include Spark, which focuses on in-memory cluster computing, the LexisNexis open source big data analytics solution, and IBM® BigSheets, which helps gather data from structured and unstructured sources to create business intelligence.
Rate this content
Give us feedback
Submission failed. Please try again.
Please complete one of the following questions before submitting.