Today’s businesses operate globally and in real-time. The exponential growth rate of data requires new approaches to capture and analyze XXL data streams in real-time.

Think about credit card fraud detection, monitoring and predictive maintenance of smart grids, or storing and analyzing sensor data from large car fleets. Each vehicle may contain over 700 sensors which constantly stream data.

In all these cases, you may want to store massive data streams in real-time but also immediately access the data as it streams in.

Db2 Event Store 2.0

IBM just released Db2 Event Store 2.0, which is a new type of database designed for real-time processing of massive data streams. It can capture over 250 billion events per day, which is roughly the amount of credit card transactions globally generated in 2018.

What architecture is used for this type of stream-processing database?

Here is the list of key requirements and corresponding design concepts:

High availability

Db2 Event Store is based on a 3-node cluster architecture. The software runs in Docker containers that are managed by Kubernetes. Each node processes a share of the workload. Incoming data is first stored in the local log files of the respective node, then replicated to the log files of the other nodes. Later the data is persisted on storage that is shared between all nodes. If one node goes down, another node takes logical ownership of the corresponding shared data.

Real-time ingest

Db2 Event Store uses data sharding, which means that tables are broken up into smaller chunks called shards. Shards are spread across multiple physical servers. In addition, to be able to deploy all cores on a single server, a table is also split into multiple shards on each machine.

Db2 Event Store uses batch insert processing. An application uses batch inserts and passes larger chunks of data to minimize latency times. As soon as a batch is stored in a node’s local log files and replicated to a quorum of nodes, the application can continue to process. The remaining processing—like data storage in the shared storage layer, indexing, and compression optimization—can run asynchronously in the background.

Real-time access to all data

In many cases, applications need immediate access to all data as it streams in. They can’t wait until the data is persisted to storage.

Db2 Event Store provides different read modes. You can decide if you want to access the very latest data, which is only in memory or in the local log files but not yet on shared storage.

In addition, for real-time analytics, Db2 Event Store is based on an active in-memory processing approach. Older data is automatically kept in memory as long as it’s accessed by queries. Data is also column organized for massive parallel processing and high compression.

Db2 Event Store benefits and capabilities

The standalone version of Db2 Event Store is bundled with Watson Studio for application development. The programming API supports different programming languages—like Java, Scala, or Python—and provides access to a plethora of open source libraries (e.g., for machine learning or data visualization).

Db2 Event Store can also be rapidly deployed as an integrated database on IBM Cloud Pak for Data. If you already have a standalone Db2 Event Store cluster, it can be accessed in IBM Cloud Pak for Data as a remote data source. Through Cloud Pak for Data data virtualization, you can even combine data from Db2 Event Store and other remote data sources in a single query.

Some of the highlights of Db2 Event Store release 2.0:

  • Common SQL engine: The supported SQL language now conforms to a broad set of ISO/IEC industry standards. In addition, you get access to IBM’s common SQL engine features like the command line processor and tools like db2pd. This simplifies your work and helps you get started, especially if you are already familiar with other Db2 products.
  • Faster queries: Different optimizations like support for multi-tier caching of synopsis and data objects.
  • Time series functions: Db2 Event Store 2.0 now contains many pre-built functions to analyze or transform time series data like prediction of future values based on trends or joining of temporal data.
  • Geospatial toolkit: Allows analysis and insight extraction from spatial data. Contains a variety of functions to process and index spatial data.

Get started with Db2 Event Store

To start working with real-time analytics, check out the Db2 Event Store demo assets, including videos, product tours (software simulations), and hands-on labs.

The Db2 Event Store lab provides you free access to a pre-installed Db2 Event Store system in the cloud.

You can also download the free Developer Edition of IBM Db2 Event Store.

More from Analytics

Announcing Control-M integration with IBM Databand for holistic data observability

2 min read - IBM® Databand® is designed to support the hybrid and multicloud data landscape and work with any orchestration, data integration or workflow automation tool. In the quest to bring all your monitoring data under one roof, Databand enables tighter integration with cloud and on-prem applications. Last time, we announced the Databand integration with Azure ADF, and this time it’s the integration with BMC Control-M. IBM Databand acts as a magnifying glass for your Control-M workflows, providing a more comprehensive understanding of…

IBM acquires StreamSets, a leading real-time data integration company

3 min read - We are thrilled to announce that IBM has acquired StreamSets, a real-time data integration company specializing in streaming structured, unstructured and semistructured data across hybrid multicloud environments. Acquired from Software AG along with webMethods, this strategic acquisition expands IBM's already robust data integration capabilities, helping to solidify our position as a leader in the data integration market and enhancing IBM Data Fabric’s delivery of secure, high-quality data for artificial intelligence (AI).  According to a Forrester study conducted on behalf of…

Fine-tune your data lineage tracking with descriptive lineage

4 min read - Data lineage is the discipline of understanding how data flows through your organization: where it comes from, where it goes, and what happens to it along the way. Often used in support of regulatory compliance, data governance and technical impact analysis, data lineage answers these questions and more.  Whenever anyone talks about data lineage and how to achieve it, the spotlight tends to shine on automation. This is expected, as automating the process of calculating and establishing lineage is crucial to…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters