Big data serialization using Apache Avro with Hadoop

Share serialized data among applications

Apache Avro is a serialization framework that produces data in a compact binary format that doesn't require proxy objects or code generation. Get to know Avro, and learn how to use it with Apache Hadoop.

Share:

Nathan A. Good, Senior Consultant and Freelance Developer, Freelance Developer

Nathan GoodNathan A. Good lives in the Twin Cities area of Minnesota. Professionally, he does software development, software architecture, and systems administration. When he's not writing software, he enjoys building PCs and servers, reading about and working with new technologies, and trying to get his friends to make the move to open source software. He's written and co-written many books and articles, including Professional Red Hat Enterprise Linux 3, Regular Expression Recipes: A Problem-Solution Approach, and Foundations of PEAR: Rapid PHP Development.



29 October 2013

Also available in Russian

Apache Avro is a framework that allows you to serialize data in a format that has a schema built in. The serialized data is in a compact binary format that doesn't require proxy objects or code generation. Instead of using generated proxy libraries and strong typing, Avro relies heavily on the schemas that are sent along with the serialized data. Including schemas with the Avro messages allows any application to deserialize the data.

InfoSphere BigInsights Quick Start Edition

Avro is a component of InfoSphere BigInsights, IBM's Hadoop-based offering. InfoSphere BigInsights Quick Start Edition is a complimentary, downloadable version of InfoSphere BigInsights. Using Quick Start Edition, you can try out the features that IBM has built to extend the value of open source Hadoop, like Big SQL, text analytics, and BigSheets. Guided learning is available to make your experience as smooth as possible including step-by-step, self-paced tutorials and videos to help you start putting Hadoop to work for you. With no time or data limit, you can experiment on your own time with large amounts of data. Watch the videos, follow the tutorials (PDF), and download BigInsights Quick Start Edition now.

This article shows how to use Avro to define a schema, create and serialize some data, and send it from one application to another. I use the Eclipse integrated development environment (IDE) to create a sample application demonstrating the use of Avro. Download the sample code for this application.

Installation

To install Avro, you must download the libraries and reference them in your code.

Download the libraries

To download the Avro libraries for use in Java™ technology, download the avro-VERSION.jar and avro-tools-VERSION.jar files (see Resources). The two libraries depend on the Jackson libraries, a link to which is also in Resources. Alternatively, use Apache Maven or IVY to download the Java archive (JAR) files and all of their dependencies automatically. An example of the Maven entry is provided in Listing 1.

Listing 1. An example of the entries for Avro dependencies
    <dependencies>
        <dependency>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro</artifactId>
            <version>1.7.5</version>
        </dependency>

        <dependency>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro-tools</artifactId>
            <version>1.7.5</version>
        </dependency>

    </dependencies>

See Resources to download libraries for other implementations, such as Python, Ruby, PHP, or C#.

Create a project

This article uses the Eclipse IDE with the m2e (Maven 2) plug-in installed to build the examples for using Avro. You will need a new project in which to create your sample schemas, compile them, and run samples. To create a new project in Eclipse, click File > New > Maven Project. Select the check box next to Create a simple project to quickly create a project that supports Maven, which this article uses to handle the Avro dependencies.

As noted in Download the libraries, one easy way to get Avro and all of its dependencies without doing it manually is to use Maven or IVY. The sample code included here uses a Maven pom.xml file to configure the Avro dependency and download it automatically from inside Eclipse. For more information about how Maven works, see Resources.

To create your own pom.xml file, create a new XML file by clicking File > New > Other. Select XML File from the list, then name the file pom.xml. When you have created the file, place the entire contents of Listing 2 inside the file and save.

Listing 2. The entire Maven pom.xml file to get started
<project xmlns="http://maven.apache.org/POM/4.0.0" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
    http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.example</groupId>
    <artifactId>avroSample</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <packaging>jar</packaging>

    <name>avroSample</name>
    <url>http://maven.apache.org</url>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.6</version>
            <scope>test</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro</artifactId>
            <version>1.7.5</version>
        </dependency>

        <dependency>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro-tools</artifactId>
            <version>1.7.5</version>
        </dependency>


        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-api</artifactId>
            <version>2.0-beta9</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>2.0-beta9</version>
        </dependency>
    </dependencies>

</project>

After you have created the pom.xml file, resolve the dependencies. Doing so downloads the Avro JAR files along with all of the other libraries that Avro requires. To resolve the dependencies in your Eclipse IDE, save the pom.xml file. When Eclipse builds the project, the Maven Builder will automatically download the new dependencies.


Avro schemas

Avro schemas describe the format of the message and are defined using JavaScript Object Notation (JSON). The JSON schema content is put into a file that the tools will reference later.

Defining a schema

Before creating a schema in your Eclipse project, create a source folder in which to put the schema files. To create the new source folder, click File > New > Source Folder. Enter src/main/avro and click Finish. The source folder follows Maven conventions, so if you want to use the Maven plug-in to generate sources, you can.

Now that you have a source folder that will contain the schema files, create a new file inside the source folder by clicking File > New > File. Give your new file a name such as automobile.asvc. Now, you can add JSON to define the schema. Listing 3 demonstrates a simple schema that defines an automobile.

Listing 3. Sample Avro schema file for an Automobile object
{
    "namespace":"com.example.avroSample.model",
    "type":"record",
    "name":"Automobile",
    "fields":[
        {
            "name":"modelName",
            "type":"string"
        },
        {
            "name":"make",
            "type":"string"
        },
        {
            "name":"modelYear",
            "type":"int"
        },
        {
            "name":"passengerCapacity",
            "type":"int"
        }
        
    ]
}

The Automobile is of type record, which is a complex type that contains a list of fields. The record complex type requires the attributes shown below.

Table 1. Attributes available on a record
Attribute nameDescriptionExample
typeThe data type of the entryrecord
nameThe name of the object. (This will be the class name when generated to code.)Automobile
namespaceA namespace for the object to avoid naming collisions. (When generated to code, this will be the package name.) com.example.avroSample.model
fieldsAn array of the fields (attributes or properties) of the objectmodelName, etc.

Each field in the list of fields contains data about its name and type. Table 2 contains a list of the attributes of a field.

Table 2. Attributes of a field
Attribute nameDescription
nameThe name of the field. (This will be the property name when generated to code.)
typeThe data type of the field

See Resources for more information about the JSON schema.


Compiling the schema

Now that you have defined a schema, you can compile it using the Avro tools. You can either use the JAR method of generating the sources or you can use a plug-in inside the pom.xml file. If you want to use the Maven method, you can skip to Use a Maven plug-in.

Use the command line

To use the JAR method, click Run > External Tools > External Tools Configurations. Select a location by clicking Browse File System to find the Java location in your environment, such as C:\Program Files\Java\jre7\bin\java.exe on a Windows® computer. For your working directory, select the project by clicking Browse Workspace. The working directory will be used as a base path for the relative paths that you supply in the Arguments box.

In the Arguments box, add the values shown in Listing 4.

Listing 4. Adding arguments
-jar avro-tools-1.7.5.jar
compile
schema
src/main/avro/automobile.avsc
src/main/java

To run the tool, click Run.

Use the Maven plug-in

To use the Maven plug-in to generate the Java proxy classes from the Avro schema, put the XML shown in Listing 5 in the pom.xml file.

Listing 5. Adding the build plug-in to the pom.xml file
<project xmlns="http://maven.apache.org/POM/4.0.0" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
    http://maven.apache.org/xsd/maven-4.0.0.xsd">

    <!-- snipped -->

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.avro</groupId>
                <artifactId>avro-maven-plugin</artifactId>
                <version>1.7.5</version>
                <executions>
                    <execution>
                        <phase>generate-sources</phase>
                        <goals>
                            <goal>schema</goal>
                        </goals>
                        <configuration>
                            <sourceDirectory>
                            ${project.basedir}/src/main/avro/
                            </sourceDirectory>
                            <outputDirectory>
                            ${project.basedir}/src/main/java/
                            </outputDirectory>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.1</version>
                <configuration>
                    <source>1.6</source>
                    <target>1.6</target>
                </configuration>
            </plugin>

        </plugins>
    </build>

    <dependencies>
        <!-- snipped -->
    </dependencies>

</project>

Now that you have the plug-in in the Maven pom.xml file, Maven will generate the sources for you automatically during the generate-sources phase, which is executed before the compile phase. For more information about the Maven life cycle, see Resources.

The generated Java class file will be located in the src/main/java folder. Listing 6 shows some of the example Automobile class, which has been generated with the com.example.avroSample.model package specified in the schema namespace.

Listing 6. Example of the generated Automobile class
/**
 * Autogenerated by Avro
 * 
 * DO NOT EDIT DIRECTLY
 */
package com.example.avroSample.model;  
@SuppressWarnings("all")
@org.apache.avro.specific.AvroGenerated
public class Automobile extends org.apache.avro.specific.SpecificRecordBase 
    implements org.apache.avro.specific.SpecificRecord {

  public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.
Schema.Parser().parse("{\"type\":\"record\",\"name\":\"Automobile\",
\"namespace\":\"com.example.avroSample.model\",\"fields\":[{\"name\":
\"modelName\",\"type\":\"string\"},{\"name\":\"make\",\"type\":\"string\"},{
\"name\":\"modelYear\",\"type\":\"int\"},{\"name\":\"passengerCapacity\",
\"type\":\"int\"}]}");
  public static org.apache.avro.Schema getClassSchema() { return SCHEMA$; }
  @Deprecated public java.lang.CharSequence modelName;
  @Deprecated public java.lang.CharSequence make;
  @Deprecated public int modelYear;
  @Deprecated public int passengerCapacity;

  /**
   * Default constructor.  Note that this does not initialize fields
   * to their default values from the schema.  If that is desired then
   * one should use {@link \#newBuilder()}. 
   */
  public Automobile() {}

  /**
   * All-args constructor.
   */
  public Automobile(java.lang.CharSequence modelName,
  	java.lang.CharSequence make, java.lang.Integer modelYear, 
	java.lang.Integer passengerCapacity) {

    this.modelName = modelName;
    this.make = make;
    this.modelYear = modelYear;
    this.passengerCapacity = passengerCapacity;

  }
  /* snipped... */
}

Now that you have generated the source files, you can use them in a Java application to see how it works.


Serialization with the schema

At this point, you should have a schema file defined with JSON content and have at least one Java source file generated from the schema. Now you are ready to write some test code that builds objects using the generated classes, assigns values to their properties, and serializes the objects to a file.

Listing 7 contains an example of Java code that uses a Builder to return an instance of an object.

Listing 7. Using a Builder to create an instance of an object
        Automobile auto = Automobile.newBuilder().setMake("Speedy Car Company")
                .setModelName("Speedyspeedster").setModelYear(2013)
               setPassengerCapacity(2).build();

        DatumWriter<Automobile> datumWriter = 
            new SpecificDatumWriter<Automobile>(Automobile.class);
        DataFileWriter<Automobile> fileWriter = 
            new DataFileWriter<Automobile>(datumWriter);

        try {
            fileWriter.create(auto.getSchema(), outputFile);
            fileWriter.append(auto);
            fileWriter.close();
        } catch (IOException e) {
            LOGGER.error("Error while trying to write the object to file <"
                    + outputFile.getAbsolutePath() + ">.", e);
        }

Values are assigned to the properties on the object before the build() method is called to return the instance. A DatumWriter is used to write the data from the object through the DataFileWriter, which writes the data contained in the object to a file while conforming to the supplied schema.

After you create a DataFileWriter implementation and call create() with the schema and file, you can add objects to the file by using the append() method, as shown in Listing 7.

To execute the sample code, either put the code in the main method of a Java class or add it to a unit test. In the sample code available with this article, the code is added to a unit test that you can execute in Eclipse or Maven.

When you execute this code, the serializer creates a file called autos.avro in the target folder of the project. Because the file is in a small, binary format, you won't be able to usefully inspect its contents using a normal text editor.


Deserialization with the schema

When you have at least one object serialized to a file, you can use Java code to deserialize the contents of the file into objects. Here, a unit test is useful because you can make assertions to verify that the values of the deserialized object are the same as the original values.

Listing 8 shows Java code that can be used to deserialize an Automobile object from the same file that was created in the previous step.

Listing 8. Java code to deserialize an Automobile object
    DatumReader<Automobile> datumReader = 
        new SpecificDatumReader<Automobile>(Automobile.class);
    try {
        DataFileReader<Automobile> fileReader = 
            new DataFileReader<Automobile>(outputFile, datumReader);

        Automobile auto = null;
        
        if (fileReader.hasNext()) {
            auto = fileReader.next(auto);
        }
        
    } catch (IOException e) {
        LOGGER.error("Error while trying to read the object from file <"
                + outputFile.getAbsolutePath() + ">.", e);
    }

This time, the code uses a DatumReader, which reads the content through the DataFileReader implementation.

The DatumReader is an iterator (that is, it extends the Iterator interface), so it reads through the file and returns objects using the next() method. This behavior will be familiar to anyone working with RecordSet objects in Java.

The code shown in Listing 9 ensures that the values on the properties are the same values originally assigned to the object before it was written to the file. When executed as a unit test, these assertions pass.

Listing 9. Adding assertions to verify the original values of the object
    DatumReader<Automobile> datumReader = 
        new SpecificDatumReader<Automobile>(Automobile.class);
    try {
        DataFileReader<Automobile> fileReader = 
            new DataFileReader<Automobile>(outputFile, datumReader);

        Automobile auto = null;
        
        if (fileReader.hasNext()) {
            auto = fileReader.next(auto);
        }
        
        assertEquals("Speedy Car Company", auto.getMake().toString());
        assertEquals("Speedyspeedster", auto.getModelName().toString());
        assertEquals(Integer.valueOf(2013), auto.getModelYear());
        
        
    } catch (IOException e) {
        LOGGER.error("Error while trying to read the object from file <"
                + outputFile.getAbsolutePath() + ">.", e);
    }

Now that you have seen Avro in action both writing and reading from a file, you can see how it works with Apache Hadoop.


Integration with Hadoop

Apache Hadoop is a framework that enables the processing of large datasets over distributed computers. Using Hadoop, you can process data over thousands of computers for scalable applications. For more information about Hadoop, see Resources.

Avro allows you to use complex data structures within Hadoop MapReduce jobs. To try out MapReduce functionality, you need to add dependencies to the pom.xml file to pull in the Hadoop library and Avro MapReduce library. To add the dependencies, modify your pom.xml file to look like Listing 10.

Listing 10. Adding the dependencies for Avro Hadoop libraries to the pom.xml file
<project xmlns="http://maven.apache.org/POM/4.0.0" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
    http://maven.apache.org/xsd/maven-4.0.0.xsd">

    <!-- snipped -->

    <build>
        <!-- snipped -->
    </build>

    <dependencies>
        
        <!-- snipped... added earlier -->

        <dependency>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro-mapred</artifactId>
            <version>1.7.5</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-core</artifactId>
            <version>1.1.0</version>
        </dependency>

    </dependencies>

</project>

Add the dependencies to the pom.xml file and save it. The m2e plug-in automatically downloads the newly added JAR files and their dependencies.

After resolving the dependencies, create an example using Avro and MapReduce that shows counts by model name in a list of automobiles, building off the prior examples. The model count example is based on the color count example on the Avro site (see Resources).

The example includes three classes: one that extends AvroMapper, one that extends AvroReducer, and a class with code to initiate the MapReduce job and write the results.

Extending AvroMapper

The AvroMapper class extends and implements several Hadoop-supplied interfaces to provide the ability to collect or map data. To demonstrate the capabilities of Avro in MapReduce functions, create a small class, as shown below.

Listing 11. The ModelCountMapper class
package com.example.avroSample.mapReduce;

import java.io.IOException;

import org.apache.avro.mapred.AvroCollector;
import org.apache.avro.mapred.AvroMapper;
import org.apache.avro.mapred.Pair;
import org.apache.hadoop.mapred.Reporter;

import com.example.avroSample.model.Automobile;

/**
 * Class class will count the number of models of automobiles found.
 */
public final class ModelCountMapper extends
        AvroMapper<Automobile, Pair<CharSequence, Integer>> {

    private static final Integer ONE = Integer.valueOf(1);

    @Override
    public void map(Automobile datum,
            AvroCollector<Pair<CharSequence, Integer>> collector,
            Reporter reporter) throws IOException {

        CharSequence modelName = datum.getModelName();

        collector.collect(new Pair<CharSequence, Integer>(modelName, ONE));

    }

}

The map method simply retrieves the model name from the passed-in Automobile object and adds a value of 1 for each occurrence of the model name. There is no math so far in the map method. Rather, the math to summarize the counts is in the class that extends AvroReducer.

Extending AvroReducer

Listing 12 shows the class that extends AvroReducer. This class accepts the values that the ModelCountMapper object has collected and summarizes them by looping through the values.

Listing 12. Code to extend AvroReducer
package com.example.avroSample.mapReduce;

import java.io.IOException;

import org.apache.avro.mapred.AvroCollector;
import org.apache.avro.mapred.AvroReducer;
import org.apache.avro.mapred.Pair;
import org.apache.hadoop.mapred.Reporter;

public class ModelCountReducer extends
        AvroReducer<CharSequence, Integer, Pair<CharSequence, Integer>> {

    /**
     * This method "reduces" the input
     */
    @Override
    public void reduce(CharSequence modelName, Iterable<Integer> values,
            AvroCollector<Pair<CharSequence, Integer>> collector, Reporter reporter)
            throws IOException {
        
        int sum = 0;
        
        for (Integer value : values) {
            sum += value.intValue();
        }
        
        collector.collect(new Pair<CharSequence, Integer>(modelName, sum));
        
    }

}

The reduce method in the ModelCountReducer class "reduces" the values the mapper collects into a derived value, which in this case is a simple sum of the values.

Running the example

To see the example in action, create a class that executes your classes using the AvroJob, as shown in Listing 13.

Listing 13. Sample class that runs the MapReduce job
package com.example.avroSample.mapReduce;

import java.io.File;

import org.apache.avro.Schema;
import org.apache.avro.Schema.Type;
import org.apache.avro.mapred.AvroJob;
import org.apache.avro.mapred.Pair;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

import com.example.avroSample.model.Automobile;

public class ModelNameCountApp extends Configured implements Tool {

    private static final Logger LOGGER = LogManager
            .getLogger(ModelNameCountApp.class);

    private static final String JOB_NAME = "ModelNameCountJob";

    @Override
    public int run(String[] args) throws Exception {

        JobConf job = new JobConf(getConf(), ModelNameCountApp.class);
        job.setJobName(JOB_NAME);

        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        AvroJob.setMapperClass(job, ModelCountMapper.class);
        AvroJob.setReducerClass(job, ModelCountReducer.class);

        AvroJob.setInputSchema(job, Automobile.getClassSchema());
        AvroJob.setOutputSchema(
                job,
                Pair.getPairSchema(Schema.create(Type.STRING),
                        Schema.create(Type.INT)));

        JobClient.runJob(job);

        return 0;

    }

    /**
     * Creates an instance of this class and executes it to provide a call-able
     * entry point.
     */
    public static void main(String[] args) {

        if (args == null || args.length != 2) {
            throw new IllegalArgumentException(
                    "Two parameters must be supplied to the command, " + 
                    "input directory and output directory.");
        }
        
        new File(args[0]).mkdir();
        new File(args[1]).mkdir();

        int result = 0;

        try {
            result = new ModelNameCountApp().run(args);
        } catch (Exception e) {
            result = -1;
            LOGGER.error("An error occurred while trying to run the example", e);
        }

        if (result == 0) {
            LOGGER.info("SUCCESS");
        } else {
            LOGGER.fatal("FAILED");
        }

    }
}

This example creates an instance of a JobConf object, then uses the AvroJob class to configure the job before executing it.

To run the example from inside Eclipse, click Run > Run Configurations to open the Run Configurations Wizard. Select Java Application from the list, then click New. Enter the name of the project and main class name (com.example.avroSample.mapReduce.ModelNameCountApp) on the Main tab, then enter the name of the directory you used as output from the tests you ran earlier. Enter the name of a suitable output directory in the Program Arguments box, such as target/avro target/mapreduce.

When you are finished adding in the directory names, click Run to run the example.

When you run the example, it will use the ModelCountReducer to collect the names of the models and the count for each using the map() method, then it applies processing to the map of keys and values using the reduce() method of the ModelCountReducer object. In this example, the processing is simply adding the counts to summarize them.


Conclusion

Apache Avro is a serialization framework that allows you to write object data to a stream. Avro serialization includes the schema with it — in JSON format — which allows you to have different versions of objects. Although code generation and proxy objects are not required, you can use Avro tools to generate proxy objects in Java to easily work with the objects.

Avro also works well with Hadoop MapReduce. MapReduce allows you to do large-scale processing on many objects to do calculations across many processors simultaneously. You can use Avro and MapReduce together to process many items serialized with Avro's small binary format.


Download

DescriptionNameSize
Sample codeavro-sample.zip11KB

Resources

Learn

  • Review the ColorCount example, which demonstrates how to use Avro with MapReduce.
  • Learn more about Apache Maven and the Maven build life cycle.
  • Read more about Apache Avro 1.7.5 and the Avro JSON schema.
  • Read more about Apache Ivy and Hadoop.
  • Take this free course from Big Data University on Hadoop Reporting and Analysis (log-in required). Learn how to build your own Hadoop/big data reports over relevant Hadoop technologies such as HBase, Hive, etc., and get guidance on how to choose between various reporting techniques: Direct Batch Reports, Live Exploration, and Indirect Batch Analysis.
  • Learn the basics of Hadoop with this free Hadoop Fundamentals course from Big Data University (log-in required). Learn about the Hadoop architecture, HDFS, MapReduce, Pig, Hive, JAQL, Flume, and many other related Hadoop technologies. Practice with hands-on labs on a Hadoop cluster using any of these methods: on the Cloud, with the supplied VMWare image, or install locally.
  • Explore free courses from Big Data University on topics ranging from Hadoop Fundamentals and text analytics essentials to SQL access for Hadoop and real-time stream computing.
  • Create your own Hadoop cluster with this free course from Big Data University (log-in required).
  • Learn more about big data in the developerWorks big data content area. Find technical documentation, how-to articles, education, downloads, product information, and more.
  • Find resources to help you get started with InfoSphere BigInsights, IBM's Hadoop-based offering that extends the value of open source Hadoop with features like Big SQL, text analytics, and BigSheets.
  • Follow these self-paced tutorials (PDF) to learn how to manage your big data environment, import data for analysis, analyze data with BigSheets, develop your first big data application, develop Big SQL queries to analyze big data, and create an extractor to derive insights from text documents with InfoSphere BigInsights.
  • Find resources to help you get started with InfoSphere Streams, IBM's high-performance computing platform that enables user-developed applications to rapidly ingest, analyze, and correlate information as it arrives from thousands of real-time sources.
  • Stay current with developerWorks technical events and webcasts.
  • Follow developerWorks on Twitter.

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Big data and analytics on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Big data and analytics
ArticleID=950002
ArticleTitle=Big data serialization using Apache Avro with Hadoop
publish-date=10292013