Today’s businesses operate globally and in real-time. The exponential growth rate of data requires new approaches to capture and analyze XXL data streams in real-time.
Think about credit card fraud detection, monitoring and predictive maintenance of smart grids, or storing and analyzing sensor data from large car fleets. Each vehicle may contain over 700 sensors which constantly stream data.
In all these cases, you may want to store massive data streams in real-time but also immediately access the data as it streams in.
Db2 Event Store 2.0
IBM just released Db2 Event Store 2.0, which is a new type of database designed for real-time processing of massive data streams. It can capture over 250 billion events per day, which is roughly the amount of credit card transactions globally generated in 2018.
What architecture is used for this type of stream-processing database?
Here is the list of key requirements and corresponding design concepts:
High availability
Db2 Event Store is based on a 3-node cluster architecture. The software runs in Docker containers that are managed by Kubernetes. Each node processes a share of the workload. Incoming data is first stored in the local log files of the respective node, then replicated to the log files of the other nodes. Later the data is persisted on storage that is shared between all nodes. If one node goes down, another node takes logical ownership of the corresponding shared data.
Real-time ingest
Db2 Event Store uses data sharding, which means that tables are broken up into smaller chunks called shards. Shards are spread across multiple physical servers. In addition, to be able to deploy all cores on a single server, a table is also split into multiple shards on each machine.
Db2 Event Store uses batch insert processing. An application uses batch inserts and passes larger chunks of data to minimize latency times. As soon as a batch is stored in a node’s local log files and replicated to a quorum of nodes, the application can continue to process. The remaining processing—like data storage in the shared storage layer, indexing, and compression optimization—can run asynchronously in the background.
Real-time access to all data
In many cases, applications need immediate access to all data as it streams in. They can’t wait until the data is persisted to storage.
Db2 Event Store provides different read modes. You can decide if you want to access the very latest data, which is only in memory or in the local log files but not yet on shared storage.
In addition, for real-time analytics, Db2 Event Store is based on an active in-memory processing approach. Older data is automatically kept in memory as long as it’s accessed by queries. Data is also column organized for massive parallel processing and high compression.
Db2 Event Store benefits and capabilities
The standalone version of Db2 Event Store is bundled with Watson Studio for application development. The programming API supports different programming languages—like Java, Scala, or Python—and provides access to a plethora of open source libraries (e.g., for machine learning or data visualization).
Db2 Event Store can also be rapidly deployed as an integrated database on IBM Cloud Pak for Data. If you already have a standalone Db2 Event Store cluster, it can be accessed in IBM Cloud Pak for Data as a remote data source. Through Cloud Pak for Data data virtualization, you can even combine data from Db2 Event Store and other remote data sources in a single query.
Some of the highlights of Db2 Event Store release 2.0:
- Common SQL engine: The supported SQL language now conforms to a broad set of ISO/IEC industry standards. In addition, you get access to IBM’s common SQL engine features like the command line processor and tools like db2pd. This simplifies your work and helps you get started, especially if you are already familiar with other Db2 products.
- Faster queries: Different optimizations like support for multi-tier caching of synopsis and data objects.
- Time series functions: Db2 Event Store 2.0 now contains many pre-built functions to analyze or transform time series data like prediction of future values based on trends or joining of temporal data.
- Geospatial toolkit: Allows analysis and insight extraction from spatial data. Contains a variety of functions to process and index spatial data.
Get started with Db2 Event Store
To start working with real-time analytics, check out the Db2 Event Store demo assets, including videos, product tours (software simulations), and hands-on labs.
The Db2 Event Store lab provides you free access to a pre-installed Db2 Event Store system in the cloud.
You can also download the free Developer Edition of IBM Db2 Event Store.