Cloud Pak for Data as a Service is a cloud native modular service platform for all your data governance, data engineering, data analysis, and AI modeling tasks. Cloud Pak for Data as a Service includes an integrated data fabric with which you can logically collect and organize all of your data so that your data consumers have instant and secure access to trusted information. Supported by the data fabric, Cloud Pak for Data as a Service includes a suite of data science and AI tools so that your data consumers can analyze your data and infuse your applications with AI for better business outcomes.
Cloud Pak for Data as a Service is a fully managed cloud service platform with the following benefits:
A data fabric is an architectural pattern for managing highly distributed and disparate data. Because it is designed for hybrid and multi-cloud data environments, a data fabric supports the decoupling of data storage, data processing, and data use. With the intelligent knowledge catalog capabilities, you can elevate data into enterprise assets that are governed globally regardless of where the data is stored, processed, or used. Catalog assets are automatically assigned metadata that describes logical connections between data sources and enriches them with semantics so that you can provide business-ready data for your applications, services, and users.
The data fabric architecture that is provided by Cloud Pak for Data as a Service enables your organization to accelerate data analysis for better, faster insights.
Watch this video to see data fabric in action.
With the capabilities of the Cloud Pak for Data as a Service data fabric architecture, you can:
The following diagram shows the five main capabilities of the data fabric and the connectivity between the platform and existing data sources.
Metadata-based knowledge core Data stewards enrich data with metadata that describes the data and informs the semantic search for data. They curate data into catalogs by using automated discovery and classification. They can further enrich data assets by creating and assigning custom governance artifacts, such as business vocabulary. They can also import ready to use collections of metadata from industry-specific Knowledge Accelerators.
Components: Watson Knowledge Catalog service, Knowledge Accelerators
Data self-service in catalogs Data scientists and other business users can find the data that they need in data catalogs that contain data from across the enterprise. They can use AI-powered semantic search and recommendations that consider asset metadata, browse for data, or view their peers' highly rated assets. They copy data assets from a catalog into a project, where they collaborate to prepare, analyze, and model the data.
Components: Watson Knowledge Catalog service
Automated data integration Data engineers and other users prepare your data for consumption. They can provide access to data in your existing data architecture and automate data preparation. They can integrate and virtualize data for faster, simpler querying. They can automate the bulk ingestion, cleansing, and complex transformations of data to regularly publish updated data assets. They can push down the processing of the data to the location of the data.
Components: Cloud Pak for Data as a Service platform, Data Refinery tool, Data Virtualization service, DataStage service, Satellite integration
Unified data governance, security, and compliance Data stewards can create data protection rules to automatically enforce uniform data privacy across the platform. Data masking deidentifies sensitive data to provide data security while it preserves data utility and prevents the need for multiple copies of the data. Data stewards can import ready to use compliance metadata from Knowledge Accelerators.
Components: Watson Knowledge Catalog service, Knowledge Accelerators
Unified lifecycle Users can design, build, test, orchestrate, deploy to production, and monitor different types of data pipelines in a unified way. Users can create or find data assets, search for them across the platform, and move them across workspaces. Users can orchestrate data transformations and other actions by scheduling jobs that run automatically.
Components: Cloud Pak for Data as a Service platform
To further explore the benefits of the data fabric, read the Data fabric architecture delivers three instant benefits white paper.
For more information on the concept of assets in Cloud Pak for Data as a Service, see Asset types and properties.
The data science and AI tools on Cloud Pak for Data as a Service enable everyone in your organization to participate in finding and sharing insights. The AI tools cover the complete AI lifecycle of preparing and training models, deploying models in your applications, and then evaluating models for bias, performance, and quality.
Comprehensive tool set Data scientists, business analysts, and machine learning engineers can collaborate while choosing the tools that fit their individual preferences and skill levels. Users can write Python or R code, visually code by creating a flow of steps on a graphical canvas, or automatically build a ranked list of model candidates.
Components: Watson Studio, Cognos Dashboard Embedded
Easy deployment Data scientists or machine learning engineers promote trained models to deployment spaces, deploy and score the models, review prediction scores and insights, and monitor deployment jobs in a dashboard.
Components: Watson Machine Learning
Trusted outcomes Machine learning engineers evaluate deployments for bias or drift and update data and retrain deployed models to maintain quality goals. Models can be easily explained and understood by business users, and are auditable in business transactions.
Components: Watson OpenScale
Cloud Pak for Data as a Service is composed of a set of core services, related services, and a sample gallery.
With Cloud Pak for Data as a Service, you can provision these types of services from the Cloud Pak for Data as a Service services catalog:
The sample gallery provides data assets, notebooks, and projects. Sample data assets and notebooks provide examples of data science and machine learning code. Sample projects, including industry accelerators, contain a set of assets and detailed instructions on how to solve a particular business problem.
This illustration shows the functionality included in the common platform and the core services.
The following functionality is provided by the platform:
Watson Studio provides the following types of functionality in projects:
Watson Machine Learning provides the following functionality:
Watson OpenScale provides the following functionality in a separate user interface:
Watson Knowledge Catalog provides the following functionality:
These services provide tools to projects, compute resources, and other workspaces to Cloud Pak for Data as a Service.
Cognos Dashboard Embedded provides the following functionality:
DataStage provides the following functionality:
Data Virtualization provides the following functionality:
IBM Match 360 with Watson provides the following functionality: