Watson Studio on Cloud Pak for Data

Description

The architecture of Watson Studio is centered around the project. Data scientists and business analysts use projects to organize resources and analyze data.

You can have these types of resources in a project:

Collaborators are the people on the team who work with the data.
Data assets point to your data that is either in uploaded files or accessed through connections to data sources.
Operational assets are the objects you create, such as scripts and models, to run code on data.
Other types of assets that provide components, templates, or other information.
Tools are the software you use to derive insights from data. These tools are included with the Watson Studio service:
- Data Refinery: Prepare and visualize data.
- Jupyter notebook editor: Code Jupyter notebooks.
- JupyterLab IDE: Code Jupyter notebooks and Python scripts with Git integration. Other project tools require additional services. See the lists of supplemental and related services.
- Federated learning: Train models on remote parties without sharing data.
- Pipelines: Automate end-to-end flows of data or models.

Watson Studio projects fully integrate with the catalogs and deployment spaces:

Catalogs are provided by the Watson Knowledge Catalog service
- You can easily move assets between projects and catalogs.
- Catalogs and projects support the same types of data assets.
- Data protection rules are enforced on catalog assets that you add to projects.
Without the Watson Knowledge Catalog service, you can create one catalog without any governance capabilities to share assets between projects.
Deployment spaces to view and manage model and other types of deployments.
- You can easily move assets between projects and deployment spaces.

Quick links

Install: Install the service
Upgrade: Upgrade the service
Administer: Manage and maintain the service
Use: Work with the service
Develop: Write code and build applications
What's new: See a list of new features
Known issues: View limitations

Integrated services

Table 1. Supplemental services. You can extend the functionality of this service with the following supplemental services, which require this service.
Service	Capability
Analytics Engine powered by Apache Spark	Run analytical, machine learning, and Spark API jobs on Apache Spark clusters.
SPSS® Modeler	Create flows to prepare data, develop and manage models, and visualize data. No coding required.
Watson Machine Learning	Build, train, and deploy machine learning models with a full range of tools.
Decision Optimization	Find the most appropriate prescriptive solutions to your business problems by using CPLEX optimization engines to evaluate millions of possibilities.
Runtime 22.2 on Python 3.10 for GPU	Access compute environments for Jupyter Notebooks that use GPU-accelerated Python 3.10 libraries.
Runtime 22.2 on R 4.2	Access compute environments to create Jupyter Notebooks that use R 4.2 libraries.
RStudio® Server Runtimes	Access the RStudio IDE.
Execution Engine for Apache Hadoop	Integrate the Watson Studio service with your remote Apache Hadoop cluster so you can explore data and build and deploy models on your remote cluster.
Watson Pipelines	Use Watson Pipelines and create end-to-end flows of machine learning pipelines to create models and customize various functions.

Table 2. Related services. The following related services are often used with this service and provide complementary features, but they are not required.
Service	Capability
Watson Knowledge Catalog	Create catalogs of curated assets with this secure enterprise catalog management platform that is supported by a data governance framework.

Analytics Engine powered by Apache Spark	Run analytical, machine learning, and Spark API jobs on Apache Spark clusters.
Watson Query	Integrate data sources across multiple types and locations into one logical data view.
AI Factsheets	Use AI Factsheets to organize and track lineage events, facts, and details for each of your machine learning models' lifecycle, and increase transparency for model governance needs.
Data Replication	Integrate and synchronize your data using near-real-time data delivery with low impact to sources.
DataStage®	Use built-in search, automatic metadata propagation, and simultaneous highlighting of compilation errors to create, edit, load, and run jobs that transform and tailor information for your enterprise.
Watson OpenScale	Infuse your AI with trust and transparency. Understand how your AI models make decisions to detect and mitigate bias.