IBM Support

Announcing IBM Db2 Big SQL v6.0 (on HDP 3.1) - Hadoop Dev

Technical Blog Post


Announcing IBM Db2 Big SQL v6.0 (on HDP 3.1) - Hadoop Dev


Announcing the immediate availability of IBM Db2 Big SQL v6.0

IBM Db2 Big SQL, an advanced SQL engine on Hadoop, has been making strides with the fast-evolving open source ecosystem by supercharging your analytical workloads on data lakes. The core capabilities of Db2 Big SQL focusses on data virtualization, SQL compatibility, scalability, performance, and of course enterprise security/governance, making it a desirable query engine to seek insights from disparate data sources including Hadoop.

Db2 Big SQL v6.0 is now being released that introduces a solution that is integrated with the latest platform HDP 3.1 and enhanced performance capabilities to data lakes along with some ease of use and enterprise features. Let’s take a quick look of what’s new in this release:


Enterprise readiness

Being enterprise ready means being secure, scalable, stable and easy to run in production. Some of the new capabilities added in this release are:

  • Integration with HDP V3.1

    Hortonworks Data Platform was released with new Hadoop 3 and new versions of many components. These new components bring a whole lot of new capabilities to build a smarter, faster Big Data platform. Some enhancements made to work seamlessly with the new platform are: 

    • Provides tight integration with various components in HDP V3.1, especially integration with Hive 3.0 APIs and metadata for smooth interoperability between Hive and Db2 Big SQL tables. 
    • Integrates with YARN 3.0 (which has now absorbed the Apache Slider project for long-running processes) to manage resources centrally while sharing the node with many services. 
    • Integrates with other components in the platform such as HBase, Sqoop, Atlas, Ranger, and so on. 
  • Create and manage tag-based policies by using Apache Atlas and Apache Ranger for Db2 Big SQL tables
  • With the integration of Atlas and Ranger, users can now define and manage resource-based policies by using tags. Once a resource is tagged, the authorization for the tag is automatically enforced, eliminating the need to create or update the policies for the resource. This expands the capabilities of centralized security to not only access-based policies, but resource-based based policies as well. 

    • New Zookeeper based solution for automatic failover for HA

    Being highly available is critical for any enterprise especially being able to automatically failover, from primary to secondary system, when an outage happens needs to be seamless with no disruption or manual intervention. Now with this new HA solution based on Zookeeper technology, to be synonymous with other Hadoop components, Db2 Big SQL provides an end-to-end HADR capability to have access to data always available. 

    • Centrally create and manage access policies in Apache Ranger for federated sources

    Apache Ranger policies for Db2 Big SQL were only able to provide access control for Hadoop objects. But now, you can set Ranger policies for federated sources as well which truly makes Apache Ranger a centralized access policy manager for Hadoop 



    Every enterprise looks for high performance for its workloads and applications. Db2 Big SQL brings the advanced SQL engine to Hadoop that enables best query execution even when the query is very complex or tool generated without hitting any Out-Of-Memory errors. With continuous improvements to enhance performance, here are some of the highlights:

    • Better reader performance on Sorted tables

    When scanning ORC and Parquet tables, the Db2 Big SQL readers apply query predicates to skip reading portions of files. When tables are sorted on columns that are predicates in queries, the table scan times can be significantly reduced. Testing shows 4-6x speedups in top 10 TPC-DS queries

    • Join Range Filter Predicate (JRFP) is ON by default

    JRFP has been available since v4.2 but it had its limitations. In this release, this capability is enhanced to generate JRFP pushdowns for as many joins as possible. This capability is turned on by default and the SQL optimizer will pushdown when appropriate. Significant performance improvement was observed especially for Star Schema queries as it skips reading large portion of fact table rows thereby reducing the overall query processing times. Performance testing shows 3-6x speedup for top 10 most improved TCP-DS queries.


    Ease of use

    Ease of use provides efficient, effective, engaging, error tolerant and easy to learn capabilities in the product. One capability focused in this release is simplification of YARN configuration for Db2 Big SQL. Here are the highlights:.

    • YARN is widely used to centrally manage resources for various services that run in a Hadoop cluster. But it is always challenging to have the right configuration. Few enhancements made to have better control over YARN, here are some of the improvements made:
      • Notify users through Ambari UI when disk failures are detected for all nodes, head and worker nodes
      • When there is a damaged DB or disk, prevent YARN from starting that node which will block any queries to be sent to that node
    • A new GUI tool, Unified Console, allows historical monitoring of queries and workloads, along with real-time monitoring of how the cluster is performing and where the bottlenecks are.
    • Install Db2 Big SQL on heterogenous clusters i.e a cluster with mixed hardware and OS versions
    • Db2 Big SQL can now support huge tables, i.e. supports maximum number of columns to 2048


    For a detailed product release announcement, check IBM Announcement

    Some useful links:

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]