Traditional data management approaches store data in disparate databases, often with data duplication across systems and time consuming, risky, and expensive data integration and processing. Getting reliable data without friction is key in achieving successful Generative AI. Watsonx.data is a data lakehouse architecture built with open standards that support both traditional SQL-derived analytics and AI driven insights with automation in a single platform, supporting the needs of different data users and a broad variety of enterprise workloads.
IBM announced several upcoming updates for new and exciting capabilities in IBM® watsonx.data™ at Think 2024, our annual event that brings together over 5,000 technology pioneers and leaders. These new features are now generally available in watsonx.data.
IBM introduced a new query engine within IBM watsonx.data, Presto C++, along with an integrated query optimizer featuring enterprise-proven query compilation technology and advanced query rewrite and cost-based optimization techniques. In other words, watsonx.data has been enhanced for fast query performance at optimized costs.
IBM watsonx.data with Presto C++ v0.286 and query optimizer on IBM Storage Fusion HCI, tested internally by IBM, was able to deliver better price performance compared to Databrick’s Photon engine, with equal query runtime at less than 60% of the cost, derived from public 100 TB TPC-DS Query benchmarks.*
IBM clients can now unlock transactional mainframe data for artificial intelligence (AI) and analytics with IBM Data Gate for watsonx™ integrated with IBM watsonx.data. This revolutionizes the way organizations synchronize, analyze and build AI models from data originating on IBM Z®.
By bringing transactional data from the mainframe into an open, governed data lakehouse such as watsonx.data, enterprises can readily build AI models to grow revenue, enhance productivity and manage costs.
IBM Knowledge Catalog announced a Gen-AI infused semantic layer, embeddable into IBM watsonx.data. When embedded, the semantic layer generates data enrichments that enable clients to find and understand previously cryptic, structured data across their data estate in natural language through semantic search. This accelerates data discovery to unlock insights faster, without requiring SQL.
IBM also announced enhanced integrations with IBM® Db2® database, Db2® Warehouse, IBM® Netezza® and Informix® with watsonx.data, and support for open formats such as Apache Iceberg to unify and share a single copy of data and metadata across the hybrid cloud without needing to migrate or re-catalog. With these integrations, clients can query data from their IBM databases across multiple engines to prepare data for AI.
IBM also announced IBM Data Product Hub, a new solution for repeatable data sharing between internal data producers and data consumers, now generally available. IBM Data Product Hub users can connect to IBM watsonx.data and package relevant metadata to create a repeatable, governed data product. That data product can then deliver the right data for various AI use cases across the organization at scale, without the need for repeated, manual workflows.
IBM was thrilled to highlight some of the amazing work our clients are doing with watsonx.data on stage at Think this year. Themes included scalability, data governance and management, and speed to value.
Try watsonx.data yourself with a free trial or book a meeting with an IBM watsonx.data product specialist. Interested in diving deeper into Think announcements? Watch the watsonx keynote.
*Based on IBM internal testing of Presto C++ 0.286 on a hyperconverged infrastructure setup with 1 master + 75 worker nodes, 1009 vCPUs, 18 TB memory, 344.8 TB of file system storage, distributed RAID and 50 GB network compared to public Databricks 100TB TPC-DS. Query benchmarks published in 2021 with 1 master + 256 worker nodes, 2112 vCPUs, 16.1 TB Memory, 528.2 TB of total storage and 10 GB Network. Pricing calculations are based on IBM watsonx.data pricing as of 7 May 2024 and Databricks published pricing for Photon as of 7 May 2024. Results are based on testing conditions and pricing as of the dates shown. Actual costs and performance can vary depending on individual client configurations and conditions. Results are derived from the Databricks SQL 8.3 benchmark and as such is not comparable to published Databricks SQL 8.3 benchmark results, as results do not comply with the Databricks SQL 8.3 benchmark specification.