In addition to its open table format specification, Iceberg also comprises a set of APIs and libraries that enable storage engines, query engines and execution engines to smoothly interact with tables following that format.
The Iceberg table format has become an integral part of the big data ecosystem, largely due to its ability to provide functions typically not available with other table formats. Using a host of metadata kept on each table, Iceberg allows for schema evolution, partition evolution and table version rollback without the need for costly table rewrites or table migration. It is fully storage-system agnostic, with support for multiple data sources and no file system dependencies.
Originally created by data engineers at Netflix and Apple in 2017 to address the shortcomings of Apache Hive, Iceberg was made open source and donated to the Apache Software Foundation the following year. It became a top-level Apache project in 2020.
Apache Iceberg’s speed, efficiency, reliability and overall user-friendliness helps to simplify and coordinate data processing at any scale. These strengths have made it a table format of choice for a number of leading data warehouses, data lakes and data lakehouses, including IBM watsonx.data®, Netezza® and Db2® warehouse.