Choosing a tool in projects (Watson Studio)

With Watson Studio and its complimentary services, projects provide a range of tools for users with all levels of experience in preparing, analyzing, and modeling data, from beginner to expert. The right tool for you depends on the type of data you have, the tasks you plan to do, and the amount of automation you want.

To pick the right tool, consider these factors.

The type of data you have

  • Tabular data in delimited files or relational data in remote data sources
  • Image files
  • Textual (unstructured) data in documents

The type of tasks you need to do

  • Prepare data: cleanse, shape, visualize, organize, and validate data.
  • Analyze data: identify patterns and relationships in data, and display insights.
  • Build models: build, train, test, and deploy models to make predictions or optimize decisions.

How much automation you want

  • Code editor tools: Use to write code in Python or R, all also with Spark.
  • Graphical builder tools: Use menus and drag-and-drop functionality on a builder to visually program.
  • Automated builder tools: Use to configure automated tasks that require limited user input.

Find the right tool:

Tools for tabular or relational data

Tools for tabular or relational data by task:

Tools for tabular or relational data
Tool Tool type Prepare data Analyze data Build models
Jupyter notebook editor Code editor
JupyterLab Code editor
RStudio Code editor
Masking flows Automated builder
Data Refinery Graphical builder
Data Virtualization Graphical builder
DataStage Graphical builder
Data Replication Graphical builder
Dashboard editor Graphical builder
SPSS Modeler Graphical builder
Decision Optimization model builder Graphical builder and code editor
AutoAI Automated builder
Metadata import Automated builder
Metadata enrichment Automated builder
Data quality rule Automated builder and code editor
IBM Match 360 Automated builder
Orchestration Pipelines Graphical builder

Tools for textual data

Tools for building a model that works with textual data:

Tools for textual data
Tool Code editor Graphical builder Automated builder
Jupyter notebook editor
JupyterLab
RStudio
SPSS Modeler
Experiment builder
Orchestration Pipelines

Tools for image data

Tools for building a model that classifies images:

Tools for image data
Tool Code editor Graphical builder Automated builder
Jupyter notebook editor
JupyterLab
RStudio
Experiment builder
Orchestration Pipelines

Accessing tools

To use a tool, you must create an asset specific to that tool, or open an existing asset for that tool. To create an asset, click New asset or Import assets and then choose the asset type you want. This table shows the asset type to choose for each tool.

Tools to asset type mapping
To use this tool Choose this asset type
Jupyter notebook editor Jupyter notebook
Masking flows Masking flows
Data Refinery Data Refinery flow
DataStage DataStage flow
Data Replication Data Replication
Dashboard editor Dashboard
SPSS Modeler Modeler flow
Decision Optimization model builder Decision Optimization
AutoAI AutoAI experiment
Experiment builder Experiment
Metadata import Metadata import
Metadata enrichment Metadata enrichment
Data quality rules Data quality rule
IBM Match 360 Master data configuration

To edit notebooks with RStudio, click Launch IDE > RStudio.

To edit notebooks with JupyterLab, click Launch IDE > JupyterLab.

Jupyter notebook editor

Use the Jupyter notebook editor to create a notebook in which you run code to prepare, visualize, and analyze data, or build and train a model.

Required services
Watson Studio
Watson Studio runtimes
Data format
Any
Data size
Any
How you can prepare data, analyze data, or build models
Write code in Python or R, all also with Spark.
Include rich text and media with your code.
Work with any kind of data in any way you want.
Use preinstalled or install other open source and IBM libraries and packages.
Schedule runs of your code
Import a notebook from a file or a URL.
Share read-only copies of your notebook externally.
Get started
To create a notebook, click New asset > Work with data and models in Python or R notebooks.
Learn more
Documentation about notebooks
Videos about notebooks

Watch a video to learn Jupyter notebook basics

This video provides a visual method to learn the concepts and tasks in this documentation.


Data Refinery

Use Data Refinery to prepare and visualize tabular data with a graphical flow editor. You create and then run a Data Refinery flow as a set of ordered operations on data.

Required services
Watson Studio or IBM Knowledge Catalog
Data format
Tabular: Avro, CSV, JSON, Microsoft Excel (xls and xlsx formats. First sheet only, except for connections and connected data assets.), Parquet, SAS with the "sas7bdat" extension (read only), TSV (read only), or delimited text data asset
Relational: Tables in relational data sources
Data size
Any
How you can prepare data
Cleanse, shape, organize data with over 60 operations.
Save refined data as a new data set or update the original data.
Profile data to validate it.
Use interactive templates to manipulate data with code operations, functions, and logical operators.
Schedule recurring operations on data.
How you can analyze data
Identify patterns, connections, and relationships within the data in multiple visualization charts.
Get started
To create a Data Refinery flow, click New asset > Prepare and visualize data.
Learn more
Documentation about Data Refinery
Videos about refining data

Watch a video to see how to refine data

This video provides a visual method to learn the concepts and tasks in this documentation.


Data Replication

Use Data Replication to integrate and synchronize data. Data Replication provides near-real-time data delivery with low impact to sources.

Required service
Data Replication
Related service
IBM Knowledge Catalog
Data formats
Data Replication works with connections to and from select types of data sources and formats. For more information, see Supported Data Replication connections.
Credentials
Data Replication uses your IBM Cloud credentials to connect to the service.
Get started
To start data replication in a project, click New asset > Replicate data.
Learn more
Documentation about Data Replication

Data Virtualization

Use Data Virtualization to connect multiple data sources into a single self-balancing collection of data sources or databases.

Data format
Relational: Tables in relational data sources
Data size
Any
How you can prepare data
Connect to multiple data sources.
Create virtual tables.
Get started
To create virtual tables, click Data > Data virtualization. From the service menu, click Virtualization > Virtualize > Tables.
Learn more
Documentation about Data Virtualization
Videos about Data Virtualization

Watch a video to see how to virtualize data

This video provides a visual method to learn the concepts and tasks in this documentation.


DataStage

Use DataStage to prepare and visualize tabular data with a graphical flow editor. You create and then run a DataStage flow as a set of ordered operations on data.

Required service
DataStage
Data format
Tabular: Avro, CSV, JSON, Parquet, TSV (read only), or delimited text files
Relational: Tables in relational data sources
Data size
Any
How you can prepare data
Design a graphical data integration flow that generates Orchestrate code to run on the high performing, DataStage parallel engine.
Perform operations such as: Join, Funnel, Checksum, Merge, Modify, Remove Duplicates, and Sort.
Get started
To create a DataStage flow, click New asset > Transform and integrate data. The DataStage tile is in the Graphical builders section.
Learn more
Documentation about DataStage
Videos about DataStage

Video disclaimer: Some minor steps and graphical elements in this video may differ from your Cloud Pak for Data deployment. This video shows the IBM watsonx user interface.


Watch a video to see how to transform data

This video provides a visual method to learn the concepts and tasks in this documentation.


Dashboard editor

Use the Dashboard editor to create a set of visualizations of analytical results on a graphical builder.

Required service
Cognos Dashboard
Data format
Tabular: CSV files
Relational: Tables in some relational data sources
Data size
Any size
How you can analyze data
Create graphs without coding.
Include text, media, web pages, images, and shapes in your dashboard.
Get started
To create a dashboard, click New asset > Visualize data in dashboards. The Dashboard editor tile is in the Graphical builders section.
Learn more
Documentation about dashboards
Videos

Watch a video to see how to build a dashboard

This video provides a visual method to learn the concepts and tasks in this documentation.


SPSS Modeler

Use SPSS Modeler to create a flow to prepare data and build and train a model with a flow editor on a graphical builder.

Required services
SPSS Modeler
Watson Studio
Data formats
Relational: Tables in relational data sources
Tabular: Excel files (.xls or .xlsx), CSV files, or SPSS Statistics files (.sav)
Textual: In the supported relational tables or files
Data size
Any
How you can prepare data
Use automatic data preparation functions.
Write SQL statements to manipulate data.
Cleanse, shape, sample, sort, and derive data.
How you can analyze data
Visualize data with over 40 graphs.
Identify the natural language of a text field.
How you can build models
Build predictive models.
Choose from over 40 modeling algorithms.
Use automatic modeling functions.
Model time series or geospatial data.
Classify textual data.
Identify relationships between the concepts in textual data.
Get started
To create an SPSS Modeler flow, click New asset > Build models as a visual flow.
Learn more
Documentation about SPSS Modeler
Videos about SPSS Modeler

Video disclaimer: Some minor steps and graphical elements in this video may differ from your Cloud Pak for Data deployment. This video shows the IBM watsonx user interface.


Watch a video to see how to build a model with SPSS Modeler

This video provides a visual method to learn the concepts and tasks in this documentation.


Decision Optimization model builder

Use Decision Optimization to build and run optimization models in the Decision Optimization modeler or in a Jupyter notebook.

Required service
Decision Optimization
Watson Studio
Data formats
Tabular: CSV files
Data size
Any
How you can prepare data
Import relevant data into a scenario and edit it.
How you can build models
Build prescriptive decision optimization models.
Create, import and edit models in Python DOcplex, OPL or with natural language expressions.
Create, import and edit models in notebooks.
How you can solve models
Run and solve decision optimization models using CPLEX engines.
Investigate and compare solutions for multiple scenarios.
Create tables, charts and notes to visualize data and solutions for one or more scenarios.
Get started
To create a Decision Optimization model, click New asset > Solve optimization problems, or for notebooks click New asset > Work with data and models in Python or R notebooks.
Learn more
Documentation about Decision Optimization
Videos about Decision Optimization

Video disclaimer: Some minor steps and graphical elements in this video may differ from your Cloud Pak for Data deployment. This video shows the IBM watsonx user interface.


Watch a video to see how to build a Decision Optimization experiment

This video provides a visual method to learn the concepts and tasks in this documentation.


AutoAI tool

Use the AutoAI tool to automatically analyze your tabular data and generate candidate model pipelines customized for your predictive modeling problem.

Required services
Watson Machine Learning
Watson Studio
Data format
Tabular: CSV files
Data size
Depends on model type. See AutoAI Overview for details.
How you can prepare data
Automatically transform data, such as impute missing values and transform text to scalar values.
How you can build models
Train a binary classification, multiclass classification, or regression model.
View a tree infographic that shows the sequences of AutoAI training stages.
Generate a leaderboard of model pipelines ranked by cross-validation scores.
Save a pipeline as a model.
Get started
To create an AutoAI experiment, click New asset > Build machine learning models automatically.
Learn more
Documentation about AutoAI
Videos about AutoAI

Watch a video to see how to build an AutoAI experiment

This video provides a visual method to learn the concepts and tasks in this documentation.


Deep Learning Experiment builder

Use the Deep Learning Experiment builder to build deep learning experiments and run hundreds of training runs. This method requires that you provide code to define the training run. You run, track, store, and compare the results in the Experiment Builder graphical interface, then save the best configuration as a model.

Required services
Watson Studio
Watson Machine Learning
Data format
Textual: CSV files with labeled textual data
Image: Image files in a PKL file. For example, a model testing signatures uses images resized to 32×32 pixels and stored as numpy arrays in a pickled format.
Data size
Any size
How you can build models
Write Python code to specify metrics for training runs.
Write a training definition in Python code.
Define hyperparameters, or choose the RBFOpt method or random hyperparameter settings.
Find the optimal values for large numbers of hyperparameters by running hundreds or thousands of training runs.
Run distributed training with GPUs and specialized, powerful hardware and infrastructure.
Compare the performance of training runs.
Save a training run as a model.
Get started
To create an experiment, click New Asset > Build deep learning experiments.
Learn more
Documentation about Experiment builder
Videos about creating deep learning experiments

Watch a video to see how to build a deep learning experiment

This video provides a visual method to learn the concepts and tasks in this documentation.


Metadata import

Use the metadata import tool to automatically discover and import technical and process metadata for data assets into a project or a catalog.

Required service
IBM Knowledge Catalog
Data format
Any
Data size
Any size
How you can prepare data
Import data assets from a connection to a data source.
Get started
To import metadata, click New asset > Import metadata for data assets.
Learn more
Documentation about metadata import
Videos about IBM Knowledge Catalog

Watch a video to see how to import asset metadata

This video provides a visual method to learn the concepts and tasks in this documentation.


Metadata enrichment

Use the metadata enrichment tool to automatically profile data assets and analyze data quality in a project.

Required service
IBM Knowledge Catalog
Data format
Relational and structured: Tables and files in relational and nonrelational data sources
Tabular: Avro, CSV, or Parquet files
Data size
Any size
How you can prepare and analyze data
Profile and analyze a select set of data assets in a project.
Get started
To enrich data, click New asset > Enrich data assets with metadata.
Learn more
Documentation about metadata enrichment
Videos about IBM Knowledge Catalog

Watch a video to see how to enrich data assets

This video provides a visual method to learn the concepts and tasks in this documentation.


Data quality rule

Use the data quality tool to create rules that analyze data quality in a project.

Required service
IBM Knowledge Catalog
Data format
Relational and structured: Tables and files in relational and nonrelational data sources
Tabular: Avro, CSV, or Parquet files
Data size
Any size
How you can prepare and analyze data
Analyze the quality of a select set of data assets in a project.
Get started
To create a data quality rule, click New asset > Measure and monitor data quality.
Learn more
Documentation about data quality rules

IBM Match 360

Use IBM Match 360 to create master data entities that represent digital twins of your customers. Model and map your data, then run the matching algorithm to create master data entities. Customize and tune your matching algorithm to meet your organization's requirements.

Required services
IBM Match 360 IBM Knowledge Catalog
Data size
Any
How you can prepare data
Model and map data from sources across your organization.
Run the customizable matching algorithm to create master data entities.
View and edit master data entities and their associated records.
Get started
To create an IBM Match 360 configuration asset, click New Asset > Consolidate data into 360-degree views.
Learn more
Documentation about IBM Match 360
Videos about IBM Match 360

Watch a video to see how to use IBM Match 360

This video provides a visual method to learn the concepts and tasks in this documentation.


RStudio IDE

Use RStudio IDE to analyze data or create Shiny applications by writing R code. RStudio can be integrated with a Git repository which must be associated with the project.

Required services
RStudio runtimes
Watson Studio
Data format
Any
Data size
Any size
How you can prepare data, analyze data, and build models
Write code in R.
Create Shiny apps.
Use open source libraries and packages.
Include rich text and media with your code.
Prepare data.
Visualize data.
Discover insights from data.
Build and train a model using open source libraries.
Share your Shiny app in a Git repository.
Get started
To use RStudio, click Launch IDE > RStudio.
Learn more
Documentation about RStudio
Videos about RStudio

Video disclaimer: Some minor steps and graphical elements in this video may differ from your Cloud Pak for Data deployment. This video shows the IBM watsonx user interface.


Watch a video to see an overview of the RStudio IDE

This video provides a visual method to learn the concepts and tasks in this documentation.


JupyterLab

Use the JupyterLab IDE to create a notebook or Python script in which you run code to prepare, visualize, and analyze data, or build and train a model. JupyterLab is integrated with a Git repository which must be associated with the project.

Required services
Watson Studio
Watson Studio runtimes
Data format
Any
Data size
Any
How you can prepare data, analyze data, or build models
Write code in Python.
Include rich text and media with your code.
Work with any kind of data in any way you want.
Use preinstalled or install other open source and IBM libraries and packages.
Import a notebook from a file.
Share your notebook or script in a Git repository.
Get started
To use JupyterLab, click Launch IDE > JupyterLab.
Learn more
Documentation about JupyterLab
Videos about notebooks

Watch a video to see how to work with notebooks in JupyterLab

This video provides a visual method to learn the concepts and tasks in this documentation.


Masking flows

Use the Masking flow tool to prepare masked copies or masked subsets of data from the catalog. Data is de-identified using advanced masking options with data protection rules.

Required service
IBM Knowledge Catalog
Data format
Relational: Tables in relational data sources
Data size
Any size
How you can prepare data, analyze data, or build models
Import data assets from governed catalog to project.
Create masking flow job definitions to specify what data to mask with data protection rules.
Optionally subset data to reduce size of copied data.
Run masking flow jobs to load masked copies to target database connections.
Get started
Ensure that pre-requisite steps in IBM Knowledge Catalog are completed. To privatize data, do one of the following tasks:
  • Click New asset > Copy and mask data.
  • Click the menu options for individual data assets to mask that asset directly.
Learn more
Documentation about masking data
Videos about IBM Knowledge Catalog

Orchestration Pipelines

Use the Pipelines canvas editor to create a flow to prepare, visualize, and analyze data, or build and train a model.

Required service
IBM Knowledge Catalog or Watson Studio
Data format
Any
Data size
Any
How you can prepare data, analyze data, or build models
Use a variety of nodes that each contain their own logs.
Incorporate notebooks into the flow to run any Python or R code.
Work with any kind of data in any way you want.
Schedule runs of your flow.
Import data from your mounted PVC, project, or ingest data from Github.
Create your custom component with a Python code.
Conditionalize your pipelines to monitor data quality however you want.
Use webhook to send emails or messages to keep up to date on the status of your flow.
Get started
To create a new pipeline, click New asset > Automate model lifecycles.
Learn more
Documentation about Orchestration Pipelines
Videos about Orchestration Pipelines

Watch a video to see how to create a pipeline

This video provides a visual method to learn the concepts and tasks in this documentation.


Data visualizations

Use data visualizations to discover insights from your data. By exploring data from different perspectives with visualizations, you can identify patterns, connections, and relationships within that data and quickly understand large amounts of information.

Required service
IBM Knowledge Catalog or Watson Studio
Data format
Tabular: Avro, CSV, JSON, Parquet, TSV, SAV, Microsoft Excel .xls and .xlsx files, SAS, delimited text files, and connected data. For more information about supported data sources, see Connectors.
Data size
No limit
Get started
To create a visualization, click Data asset in the list of asset types in your project, and select a data asset. Click the Visualization tab, and choose a chart type.
Learn more
Visualizing your data

Parent topic: Projects