I don't need to tell you that the Internet is big... but just how big? No one knows for sure, but some estimates include words like exabytes and zettabytes. And the volume of data out there just keeps growing -- which means the tools we use to manipulate that data had better be robust. Tools like Apache Hadoop. This Linux-based computing framework provides a reliable, scalable, and efficient framework for the distributed processing of large amounts of data. Our Linux zone hosts an excellent introduction to Hadoop, and this week we're launching a series that shows you how it's used in the real world: Part 1 of "Distributed data processing with Hadoop" explores the basics of the framework and shows you how to install and configure a single-node Hadoop cluster, as well as monitor and manage it through its core Web interface. See for yourself why this open source technology is now being used in much more than search engines.
And for the record, a zettabyte is about 1,000,000,000,000,000,000,000 bytes -- roughly the size of that .jpg your cousin e-mailed you last week.
Until next week,
John Swanson and the developerWorks editorial team
This week's top features on developerWorks:
- Creating a NIEM IEPD, Part 4: Assemble the IEPD (XML)
- Monitor shared processor pools with lpar2rrd (AIX and UNIX)
- Integrate WebSphere ILOG JRules with IBM Content Manager Enterprise Edition (Information Management)
- Optimize your Red Hat Linux operating system on Intel Xeon 5500 platform (Rational)
- Use Ajax with Web services: Combining two leading-edge technologies is easier than you think (Web development)
- Using binary Jar files with WebSphere Integration Developer and WebSphere Process Server (WebSphere)