Today’s modern technology landscape is experiencing an explosion of data. Organizations need to be able to trust and access this data to generate meaningful insights. Enter IBM Cloud Pak® for Data 5.0, the newest release of the cloud-native insight platform that integrates the tools needed to collect, organize and analyze data within a data fabric architecture.
IBM enables customers to build a data fabric architecture via Cloud Pak for Data, the platform that provides composable services spanning data integration, data governance, data observability, master data management and data lineage use cases.
The IBM Software team is excited to announce the next version of Cloud Pak for Data: version 5.0, the 15th feature release. This version offers customers new platform features and enhancements in addition to new service features and enhancements that span the entirety of the data fabric portfolio.
With the new release of the Immersive Experience feature within IBM Cloud Pak for Data 5.0, customers can now use IBM AI and data fabric platforms in tandem without the need for complex integrations or separate management systems.
The Immersive Experience facilitates integration between watsonx, the AI and data platform designed to help businesses scale and accelerate their AI and data-driven initiatives, and Cloud Pak for Data within a single Red Hat® OpenShift® cluster and namespace.
This innovative approach brings these two products together seamlessly. It enables the user to toggle back and forth between technologies within a single platform, enabling the organizations to streamline their IT and Day 2 operations. Refer to the picture for the user experience provided by Immersive Experience on Cloud Pak for Data:
Furthermore, when installed together, both the watsonx and Cloud Pak for Data brands maintain their own distinct user experience, which can be compared to having two separate tools with the convenience of a single platform.
This experience is achieved through perspectives, customized views that start based on your wanted features. In addition, the node pinning or resource pool technology included in the release supports better allocation of licenses. It also facilitates the coexistence of multiple products, so users can focus on driving innovation and growth without worrying about technical complexities. More information on this new feature can be explored.
In Cloud Pak for Data 5.0, the remote data plane unlocks new possibilities for data engineers. Users are no longer limited to performing their data jobs in one location. The innovation of the remote data plane brings processing capabilities to the data, circumventing costly and sometimes impossible data transfer while fully complying with data sovereignty laws.
By allowing workloads to be moved closer to where data resides, the remote data plane consolidates, expands and refines workloads throughout on-premises and multiple cloud ecosystems, prioritizing compliance, performance and cost-effectiveness.
By consolidating multiple Cloud Pak for Data instances into one instance and running pipelines where the data resides, remote data planes on Cloud Pak for Data 5.0 provide these benefits and can be further explored here.
Equally significant to the listed updates are the connectivity options available in Cloud Pak for Data 5.0. This version offers connectivity at the platform and individual service levels with immersive connectivity to different data sources.
With an improvement of over 100 connectors and various formats supported, Cloud Pak for Data 5.0 offers variety and flexibility for customers, and the option to use generic JDBC and the Connector SDK to build custom connectors.
Also, now available in Cloud Pak for Data 5.0 are platform-wide certified connector support for Apache Iceberg, Delta Lake table format and Milvus vector database, enabling seamless connectivity and unlocking new customer possibilities. These new connectors along with those that are existing, are all tested to help ensure seamless connectivity between Cloud Pak for Data and over 100 data sources.
As previously mentioned, one of the solution areas that the IBM Cloud Pak for Data platform addresses is helping customers build a data fabric architecture. The platform is composed of a modular set of integrated data fabric service components that automate integration, metadata management and data governance. In addition to the new platform features and enhancements of IBM Cloud Pak for Data, there are several notable updates to the data fabric services available in version 5.0.
One key data fabric service available on Cloud Pak for Data is IBM® DataStage®, the industry-leading data integration solution that supports various combinations of extract, transform and load (ETL) patterns that move and transform data for AI readiness.
DataStage plays a critical role in the launch of Cloud Pak for Data 5.0 because it is built to use the new remote data plane capability. Explore the new product features available on Cloud Pak for Data 5.0.
IBM is dedicated to enhancing the productivity of data users. This commitment is further indicated by the enhancements to the IBM Knowledge Catalog offering in Cloud Pak for Data 5.0.
With the new IBM Knowledge Catalog Standard and IBM Knowledge Catalog Premium Cartridges, IBM delivers generative AI-enabled, modern data intelligence solutions to help organizations scale data governance and boost the productivity of data practitioners by automatically assigning business context to enterprise data at scale.
This enriched metadata context can then be used to automate searchability, streamline access control and improve reporting, unlocking the full potential of self-service AI and analytics.
The IBM Knowledge Catalog Standard Cartridge includes core features such as a glossary, catalog, workflow and automated metadata enrichment, enabling effective use of data assets and fostering greater automation of data management and unification of business metadata.
The IBM Knowledge Catalog Premium Cartridge builds on the Standard Cartridge’s features with added capabilities, including robust data protection and extensive data quality features to support regulatory compliance and deliver trusted data to the enterprise.
The Cloud Pak for Data 5.0 launch encompasses another novel feature of IBM Knowledge Catalog, which is Relationship Explorer. Relationship Explorer offers a powerful solution to address the challenge of data literacy and governance as data estates grow in complexity by using a knowledge graph database to present a visual map of relationships between data assets and governance artifacts.
This feature allows data stewards and compliance officers to identify sensitive data locations, visualize policy and rule flows, and assess the impact of changes in governance assignments. For a detailed exploration of Relationship Explorer, read the blog post.
IBM’s newest offering on Cloud Pak for Data 5.0, Data Product Hub provides a data-sharing solution to enable organizations to accelerate the enterprise-wide sharing of reusable data products in a governed manner. Data producers can now create and share actively managed data products, sourced from disparate source systems with data consumers across the organization.
Data Product Hub allows data users to own the entire data product lifecycle, from the onboarding to the retirement of a data product. With Data Product Hub, data consumers can quickly discover and use data across domains, without worrying about compliance, security and data quality.
Learn more about how Data Product Hub simplifies the onboarding, sharing, discovery and delivery of reusable data products, no matter where the data resides by reading the blog.
It is well known that a robust data strategy is critical to AI implementations. Organizations require reliable data for robust AI models and accurate insights. However, the current technology landscape presents unparalleled data quality challenges. Gartner reports that through 2025, 30% of generative AI projects will be abandoned after proof of concept due to poor data quality (link resides outside ibm.com). As organizations embrace generative AI to transform business decision-making, the quality of data used in AI will be a crucial determinant of success.
Organizations can help ensure the quality of their data and help break down data silos by implementing a data fabric architecture. IBM’s data fabric provides organizations with a trusted data foundation, enabling clients to automate data discovery, enrichment and protection with our data governance and quality capabilities, employing various data integration styles to deliver reliable data for AI workflows. This composable architecture allows IBM to meet clients wherever they are in their data journey.
One of the vehicles through which IBM helps customers build a data fabric architecture is Cloud Pak for Data. With the release of IBM Cloud Pak for Data 5.0 and the new features, specifically Immersive Experience, Remote Data Planes, Relationship Explorer and Data Product Hub, along with the rest of the components of the IBM Data Fabric architecture, customers can optimize their modern data workloads and scale analytics and AI with prepared quality data.