What is Apache Avro?

What is Avro?

Avro is an open source project that provides data serialization and data exchange services for Apache Hadoop. These services can be used together or independently.

Avro facilitates the exchange of big data between programs written in any language. With the serialization service, programs can efficiently serialize data into files or into messages. The data storage is compact and efficient. Avro stores both the data definition and the data together in one message or file.

Avro stores the data definition in JSON format making it easy to read and interpret; the data itself is stored in binary format making it compact and efficient. Avro files include markers that can be used to split large data sets into subsets suitable for Apache MapReduce processing. Some data exchange services use a code generator to interpret the data definition and produce code to access the data. Avro doesn’t require this step, making it ideal for scripting languages.

A key feature of Avro is robust support for data schemas that change over time — often called schema evolution. Avro handles schema changes like missing fields, added fields and changed fields; as a result, old programs can read new data and new programs can read old data. Avro includes APIs for Java, Python, Ruby, C, C++ and more. Data stored using Avro can be passed from programs written in different languages, even from a compiled language like C to a scripting language like Apache Pig.

Related solutions
IBM® watsonx.data™

Watsonx.data enables you to scale analytics and AI with all your data, wherever it resides, through an open, hybrid and governed data store.

Discover watsonx.data
Data management software and solutions

Design a data strategy that eliminates data silos, reduces complexity and improves data quality for exceptional customer and employee experiences.

Discover data management solutions
Data and AI consulting services

Successfully scale AI with the right strategy, data, security and governance in place.

Explore data and AI consulting services
Take the next step

Unify all your data for AI and analytics with IBM® watsonx.data™. Put your data to work, wherever it resides, with the hybrid, open data lakehouse for AI and analytics.

  1. Discover watsonx.data
  2. Explore data management solutions