What's New

dashDB brings MPP Scale to IBM’s BLU Acceleration

Share this post:

The marketplace for cloud based relational data warehousing seems to be heating up this year with IBM having just launched a major new push for their dashDB Cloud Data Warehousing service.

For those who might not be aware, dashDB is IBM’s answer to Amazon’s Redshift Cloud Data Warehouse, and to Microsoft’s Azure SQL Data Warehouse. dashDB is provided as a managed service, which means that you pay for a certain amount of storage and compute capacity, but IBM handles the administration of your database system. It’s actually a very nifty concept; you pay for the capacity you need, you don’t need to worry about managing the hardware or administering the database, and to top it off the actual service itself is continually being upgraded with new features and capabilities on a much more rapid pace than with conventional database offerings.

Since dashDB has been out for a while already, a logical question is why IBM chose now to suddenly put a big new push behind it? And the answer to that is that dashDB has recently added MPP scale out capabilities, which means that it is no longer constrained by single server capacity and can provide truly massive warehousing capacity through clustering.

That’s all nice and good, but what’s particularly interesting here is that dashDB is built on top of IBM’s BLU Acceleration in-memory column store technology, and up until this point, BLU Acceleration has only been available as a single server offering. So has IBM actually extended its BLU technology into a true MPP column store? Or, is this some type of sleight of hand where they’ve just stitched together a bunch of single server databases and called it “BLU MPP”? Does dashDB really deliver the promised “BLU Acceleration” at MPP scale?

As it turns out, the answer is a resounding yes –- this is in fact the real deal! I have a bit of an inside track as one of the lead architects involved in the project, and I’ve had the privilege of working with a very talented team of engineers to bring this technology to fruition.

We have built MPP awareness directly into the BLU columnar query engine. This means that our query planning and optimization are fully MPP aware. We’ve also taken great pains to ensure that the boundaries where the data crosses the network fit seamlessly into the accelerated columnar query processing. To achieve this we’ve enabled BLU to exchange data between servers directly within the columnar query engine in its native columnar vector format. Finally, we’ve built a communications infrastructure that is optimized for highly parallel multi-core systems and allows us to maintain parallelism on both ends of the network pipe. The end result is that when the columnar query engine needs to exchange data across the network, it can do this in optimized columnar form, and in a highly parallel manner –- almost as if the network wasn’t there. So you get all the existing benefits of BLU Acceleration like dynamic in-memory processing on compressed columnar data, parallel vector processing, data skipping, but now you get it at massive scale.

blumpp

So what sort of difference are you going to see between the biggest dashDB single server offering and the new dashDB MPP scale offerings? The differences are actually quite significant; in addition to being able to offer massive scale in terms of storage and compute capacity, the dashDB MPP offering also boasts more memory per core and an improved I/O subsystem.

My colleague @Michael_KF_KWOK has run some internal comparison benchmarks between our dashDB 4TB single node offering and our dashDB MPP 3-server offering with a parallel workload generated by IBM Cognos. In these tests, he measured an overall throughput speedup of 10x(!). This large speedup can be attributed to the combination of much greater compute capacity in the 3-server cluster as well as the faster I/O subsystem. Bottom line is that dashDB at MPP scale delivers on its promise!

Perhaps the most exciting part of this technology is that we will continue to evolve it and deliver ongoing frequent improvements to the service.

In many ways this is just the beginning so stay tuned…

More stories
April 30, 2019

Introducing IBM Analytics Engine v1.2 and Announcing the Deprecation of IBM Analytics Engine v1.0

We are excited to inform you about the new version of IBM Analytics Engine v1.2 that will be available starting May 15, 2019. Along with this release, Analytics Engine v1.0 will be retired.

Continue reading

April 23, 2019

Announcing the Deprecation of the Watson Machine Learning JSON Token Authentication Service

We’d like to inform you about the deprecation of the Watson Machine Learning JSON Token Authentication service. This method of authentication will be retired on May 30, 2019.

Continue reading

April 19, 2019

Introducing IBM Cloud Object Storage Firewall: Further Secure Your Data

IBM Cloud Object Storage (COS) is giving you more control over who can access your data. We have introduced a new capability allowing you to configure your buckets with trusted IP address(es) that will dictate access to the data in COS.

Continue reading