Netezza Map/Reduce

The map/reduce feature is a software framework that allows you to implement MapReduce applications and run them on Netezza Performance Server. With the Netezza Performance Server approach, input data is stored in a distributed table. The output of a job is stored in the database as well. Mapper tasks work on independent parts of an input table called dataslices. The map outputsare sorted and redistributed to reduce tasks. The framework creates and runs the appropriate SQL query to perform the map/reduce data flow. Database columns (record fields) are mapped to the keys and values concepts of “the MapReduce model.

This guide provides a comprehensive description of how to start working with Netezza map/reduce. It is divided into the following parts:
  • Simple Example for Getting Started – Describes how to quickly write and run your first map/reduce program.
  • User Interfaces – Contains basic information about the API. For more detailed information, see the IBM Netezza Analytics map/reduce API Reference manual.
  • Advanced Functionality – Describes more advanced concepts, such as generic command line options, counters, and logging.
  • Netezza Analytics Map/Reduce Examples – Provides four map/reduce examples based on the JARs distributed with the map/reduce software.
  • Netezza Analytics Map/Reduce Streaming – Explains map/reduce streaming, which allows you to run map/reduce programs written in languages other than Java.