What is Apache HBase?
Explore IBM's HBase solution Subscribe for AI updates
Illustration with collage of pictograms of clouds, pie chart, graph pictograms
What is HBase?

HBase is a column-oriented non-relational database management system that runs on top of Hadoop Distributed File System (HDFS), a main component of Apache Hadoop.

HBase provides a fault-tolerant way of storing sparse data sets, which are common in many big data use cases. It is well suited for real-time data processing or random read/write access to large volumes of data.

Unlike relational database systems, HBase does not support a structured query language like SQL; in fact, HBase isn’t a relational data store at all. HBase applications are written in Java™ much like a typical Apache MapReduce application. HBase does support writing applications in Apache Avro, REST and Thrift.

An HBase system is designed to scale linearly. It comprises a set of standard tables with rows and columns, much like a traditional database. Each table must have an element defined as a primary key, and all access attempts to HBase tables must use this primary key.

Avro, as a component, supports a rich set of primitive data types including: numeric, binary data and strings; and a number of complex types including arrays, maps, enumerations and records. A sort order can also be defined for the data.

HBase relies on ZooKeeper for high-performance coordination. ZooKeeper is built into HBase, but if you’re running a production cluster, it’s suggested that you have a dedicated ZooKeeper cluster that’s integrated with your HBase cluster.

HBase works well with Hive, a query engine for batch processing of big data, to enable fault-tolerant big data applications.

IBM named a leader by IDC

Read why IBM was named a leader in the IDC MarketScape: Worldwide AI Governance Platforms 2023 report.

Related content

Register for the ebook on generative AI

An example of HBase

An HBase column represents an attribute of an object; if the table is storing diagnostic logs from servers in your environment, each row might be a log record, and a typical column  could be the timestamp of when the log record was written, or the server name where the record originated.

HBase allows for many attributes to be grouped together into column families, such that the elements of a column family are all stored together. This is different from a row-oriented relational database, where all the columns of a given row are stored together. With HBase you must predefine the table schema and specify the column families. However, new columns can be added to families at any time, making the schema flexible and able to adapt to changing application requirements.

Just as HDFS has a NameNode and slave nodes, and MapReduce has JobTracker and TaskTracker slaves, HBase is built on similar concepts. In HBase a master node manages the cluster and region servers store portions of the tables and perform the work on the data. In the same way HDFS has some enterprise concerns due to the availability of the NameNode HBase is also sensitive to the loss of its master node.

Engage with an expert

Schedule a no-cost, one-on-one call with an IBM big data expert to learn how we can help you extend data science and machine learning across the Apache Hadoop ecosystem.

Related solutions
Drive better, faster analytics with big data solutions from IBM and Cloudera

IBM and Cloudera have partnered to offer an industry-leading, enterprise-grade Hadoop distribution, including an integrated ecosystem of products and services to support faster analytics at scale.

Explore big data opportunities with IBM
Resources The Data Warehouse Evolved: A Foundation for Analytical Excellence

Explore a best-in-class approach to data management and how companies are prioritizing data technologies to drive growth and efficiency.

Understanding big data beyond the hype

Read this practical introduction to the next generation of data architectures. It introduces the role of the cloud and NoSQL technologies and discusses the practicalities of security, privacy and governance.

Take the next step

Scale AI workloads for all your data, anywhere, with IBM watsonx.data, a fit-for-purpose data store built on an open data lakehouse architecture.

Explore watsonx.data Book a live demo