My IBM Log in

A new era in data governance and DataOps with IBM watsonx.data 2.1

11 December 2024

 

Author

Kevin Shen

Principal Product Manager - watsonx.data

IBM

Data is the life blood of artificial intelligence (AI) systems. Having all the right data readily available, easily accessible, yet consistently governed—regardless of how and where it is stored and represented—can help enable today’s AI systems to reveal deeper insights that ultimately deliver smarter business outcomes.

Available today, IBM watsonx.data™ 2.1 ushers in a new era for data governance, DataOps and more.

As part of the IBM watsonx™ Data and AI platform, IBM watsonx.data is the hybrid, open data lakehouse to simplify data access and sharing, optimize workloads for price-performance, and prepare your data for AI and analytics at scale—anywhere your data resides. 

IBM watsonx.data 2.1 delivers the following data governance capabilities and enhancements:

  • Common Policy Gateway (CPG) with Apache Ranger - Enhancing data governance capabilities by leveraging Apache Ranger's advanced policy management features for fine-grained access. Allows customers to use their preferred third-party policy engines, simplifying the integration process and reducing the time and effort needed to maintain consistent governance across their data environments.
  • Data Lineage with Open Lineage (Manta integration) - Will capture and publish Job, Run, and Dataset events from watsonx.data Presto and Spark engines to Manta's Open Lineage API or a hosted Messaging Queue. Users can access and explore lineage information through the Manta UI, providing full visibility into data flows and transformations.
  • Data observability with Databand - Enables collecting, aggregating, and forwarding logs, metrics, and tracing events from all services within the lakehouse to enhance automation, serviceability, monitoring, and debugging. Improved system reliability, help diagnose and resolve issues faster and give customers the ability to monitor platform services with real-time data.
  • Platform observability with Open Telemetry (Instana integration) - Open Telemetry integration in watsonx.data delivers flexible, open-standard observability with deep monitoring and tracing through IBM Instana®. Customers can also use JMX Metrics with Prometheus for monitoring, enhancing platform reliability and enabling faster issue resolution.
  • Metadata Store (MDS) - Provides a flexible, scalable architecture to manage metadata for all data types—structured and unstructured. This enables comprehensive governance and control over diverse data assets, offering a unified view across the organization.
  • Open Catalog - MDS adheres to industry standards like the Iceberg REST Catalog API and Unity Catalog Open API specifications. Ensures "open" compatibility, allowing seamless integration with other data engines and services. Customers benefit from enhanced interoperability, flexibility, and a future-ready metadata management system that promotes a more efficient, scalable, and open data ecosystem.
  • Public preview: Semantic Search - Leverages semantic enrichment to deliver accurate search results. Will enable users to perform searches on Tables, Views, and Schemas using natural language queries. By leveraging semantic enrichment to foster an intuitive search, based on context and meaning behind the search terms. This functionality will be embedded across the Data Explorer UI screens, enhancing user experience and productivity.

From a DataOps perspective watsonx.data 2.1 delivers the following capabilities and enhancements:

  • Iceberg table discovery & ingestion - Ability to discover existing iceberg tables stored in external buckets & ingest them into watsonx.data. Will support discovery and ingestion of iceberg tables on AWS S3, IBM COS, and various other object storage locations.
  • Public preview: Presto Query Pushdown for JDBC - Addresses the performance limitation of Presto federation to remote data sources such as DB2, NZ, Oracle and others. With the ability to “pushdown” or send parts of or the entire query to be executed at the remote data source, you see a reduced network data transfer and significantly improve performance. Foundation capability for Global Federation Service.
  • watsonx.ai Notebooks use of watsonx.data Spark engine - Ability for watsonx.ai notebooks to use watsonx.data Spark as the underlying compute engine. This will enable users to utilize watsonx.data Spark within a web-based notebook experience. Watsonx.ai notebook support compliments the VSCode development environment support that we have in our product today.
  • HBase Connector for Presto - Users can configure Apache HBase services as a data source with watsonx.data Presto and run federated queries with lakehouse data.
  • Support Apache Ozone filesystem as a data source in Presto - Users can configure Apache Ozone services as a storage with watsonx.data Presto and run federated queries with lakehouse data.
  • Google Cloud Storage and Azure Storage support for Presto C++ engine - Users can configure Presto C++ engine to use ADLS or GCS as a storage to persist customer data.

IBM watsonx.data 2.1 also delivers the following Milvus and Saas enhancements:

  • GPU Index Support (SW) - new GPU index, CAGRA, offers a 5x performance boost, especially for batch searches. Enabling GPU support in Milvus creates significant performance gains over CPU based, as seen in various results shared by the performance team. With Milvus GPU support as a feature in watsonx.data and allow users to leverage the GPU resources on their Openshift clusters (for CPD customers).

Try IBM watsonx.data to experience the future of data.