- Study 1: Replicated the Apache Pig benchmark (Apache Foundation, 11/07/2007)
- Study 2: Applied TPC-H benchmarks
In Study 1, Pig seemed to outperform Hive on most operations. However, in Study 2, evidence suggested that Hive is significantly faster than Pig. The article analyzes the two benchmarks, describes the differences, and justifies the results.
The article assumes a basic knowledge about Hadoop and big data and some experience working with benchmarking data.
See Resources for relevant links.
|Full text of benchmarking article||pighivebenchmarking.pdf||577KB|
- Discover how to get the information you need from big data sets in the article "Process your data with Apache Pig" (developerWorks, M. Tim Jones, February 2012).
- Learn more about Apache Hive, a data warehouse built for data-intensive distributed applications in the article "Analyzing large datasets with Hive" (developerWorks, Zafar Gilani and Salman Ul Haq, July 2013).
- Find performance numbers for Pig as of 11/07/07.
- Learn more about "TPC-H benchmarking of Pig Latin on a Hadoop cluster" (R. Moussa, June 2012).
- Explore how query languages compare in this white paper "Comparing High Level MapReduce Query Languages" (R.J. Stewart, P.W. Trinder, and H-W. Loidl, 2011).
Get products and technologies
- Download InfoSphere BigInsights Quick Start Edition, a free, downloadable non-production version of BigInsights that enables new solutions to cost-effectively turn large, complex volumes of data into insight by combining Apache Hadoop with unique, enterprise-ready technologies and capabilities from across IBM.
- Download InfoSphere Streams Quick Start Edition, a free, downloadable, non-production version of InfoSphere Streams, a high-performance analytic platform that allows user-developed applications to rapidly ingest, analyze, and correlate information as it arrives from thousands of real-time sources.
- Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.
Dig deeper into Big data and analytics on developerWorks
Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.
Tools, technologies, and software for building enterprise analytics solutions.
Software development in the cloud. Register today to create a project.
Evaluate IBM software and solutions, and transform challenges into opportunities.