Choosing a tool in analytics projects (Watson Studio)
With Watson Studio and its complimentary services, analytics projects provide a range of tools for users with all levels of experience in preparing, analyzing, and modeling data, from beginner to expert. The right tool for you depends on the type of data you have, the tasks you plan to do, and the amount of automation you want.
To pick the right tool, consider these factors.
The type of data you have
- Tabular data in delimited files or relational data in remote data sources
- Image files
- Textual data in documents
The type of tasks you need to do
- Prepare data: cleanse, shape, visualize, organize, and validate data.
- Analyze data: identify patterns and relationships in data, and display insights.
- Build models: build, train, test, and deploy models to make predictions or optimize decisions.
How much automation you want
- Code editor tools: Use to write code in Python, R, or Scala.
-
Graphical canvas tools: Use menus and drag-and-drop functionality on a canvas to visually program.
-
Automated builder tools: Use to configure automated tasks that require limited user input.
Find the right tool:
Tools for tabular or relational data
Tools for tabular or relational data by task:
| Tool | Tool type | Prepare data | Analyze data | Build models |
|---|---|---|---|---|
| Jupyter notebook editor | Code editor | ✓ | ✓ | ✓ |
| JupyterLab | Code editor | ✓ | ✓ | ✓ |
| RStudio | Code editor | ✓ | ✓ | ✓ |
| Masking flows | Automated builder | ✓ | ||
| Data Refinery | Graphical builder | ✓ | ✓ | |
| Dashboard editor | Graphical builder | ✓ | ||
| SPSS Modeler | Graphical builder | ✓ | ✓ | ✓ |
| Decision Optimization model builder | Graphical builder and code editor | ✓ | ✓ | |
| AutoAI | Automated builder | ✓ | ✓ | |
| Federated Learning | Automated builder | ✓ | ||
| Metadata import | Automated builder | ✓ | ||
| IBM Match 360 with Watson | Automated builder | ✓ |
Tools for textual data
Tools for building a model that classifies textual data:
| Tool | Code editor | Graphical builder | Automated builder |
|---|---|---|---|
| Jupyter notebook editor | ✓ | ||
| JupyterLab | ✓ | ||
| RStudio | ✓ | ||
| SPSS Modeler | ✓ | ||
| Experiment builder | ✓ |
Tools for image data
Tools for building a model that classifies images:
| Tool | Code editor | Graphical builder | Automated builder |
|---|---|---|---|
| Jupyter notebook editor | ✓ | ||
| JupyterLab | ✓ | ||
| RStudio | ✓ | ||
| Experiment builder | ✓ |
Accessing tools
To use a tool, you must create an asset specific to that tool, or open an existing asset for that tool. To create an asset, click Add to project and then choose the asset type you want. This table shows the asset type to choose for each tool.
| To use this tool | Choose this asset type |
|---|---|
| Jupyter notebook editor | Jupyter notebook |
| Masking flows | Masking flows |
| Data Refinery | Data Refinery flow |
| Dashboard editor | Dashboard |
| SPSS Modeler | Modeler flow |
| Decision Optimization model builder | Decision Optimization |
| AutoAI | AutoAI experiment |
| Experiment builder | Experiment |
| Federated Learning | Federated Learning experiment |
| Metadata import | Metadata import |
| IBM Match 360 with Watson | Master data configuration |
To edit notebooks with RStudio, click Launch IDE > RStudio.
To edit notebooks with JupyterLab, click Launch IDE > JupyterLab.
Jupyter notebook editor
Use the Jupyter notebook editor to create a notebook in which you run code to prepare, visualize, and analyze data, or build and train a model.
Data format Any
Data size Any
How you can prepare data, analyze data, or build models Write code in Python, R, or Scala. Include rich text and media with your code. Work with any kind of data in any way you want. Use preinstalled or install other open source and IBM libraries and packages. Schedule runs of your code Import a notebook from a file or a URL. Share read-only copies of your notebook externally.
Get started To create a notebook, click Add to project > Notebook.
Learn more
Documentation about notebooks
Data Refinery
Use Data Refinery to prepare and visualize tabular data with a graphical flow editor. You create and then run a Data Refinery flow as a set of ordered operations on data.
Data format
Tabular: Avro, CSV, JSON, Parquet, SAS with the "sas7bdat" extension (read only), TSV (read only), or delimited text files
Relational: Tables in relational data sources
Data size Any
How you can prepare data Cleanse, shape, organize data with over 60 operations. Save refined data as a new data set or update the original data. Profile data to validate it. Use interactive templates to manipulate data with code operations, functions, and logical operators. Schedule recurring operations on data.
How you can analyze data Identify patterns, connections, and relationships within the data in multiple visualization charts.
Get started To create a Data Refinery flow, click Add to project > Data Refinery flow.
Learn more Documentation about Data Refinery
Dashboard editor
Use the Dashboard editor to create a set of visualizations of analytical results on a graphical canvas.
Required service Cognos Dashboard
Data format Tabular: CSV files Relational: Tables in some relational data sources
Data size Any size
How you can analyze data Create graphs without coding. Include text, media, web pages, images, and shapes in your dashboard.
Get started To create a dashboard, click Add to project > Dashboard.
Learn more
Documentation about dashboards
SPSS Modeler
Use SPSS Modeler to create a flow to prepare data and build and train a model with a flow editor on a graphical canvas.
Required services SPSS Modeler Watson Machine Learning
Data formats Relational: Tables in relational data sources Tabular: Excel files (.xls or .xlsx), CSV files, or SPSS Statistics files (.sav) Textual: In the supported relational tables or files
Data size Any
How you can prepare data Use automatic data preparation functions. Write SQL statements to manipulate data. Cleanse, shape, sample, sort, and derive data.
How you can analyze data Visualize data with over 40 graphs. Identify the natural language of a text field.
How you can build models
Build predictive models.
Choose from over 40 modeling algorithms.
Use automatic modeling functions.
Model time series or geospatial data.
Classify textual data.
Identify relationships between the concepts in textual data.
Get started To create an SPSS Modeler flow, click Add to project > Modeler flow and then choose IBM SPSS Modeler.
Learn more Documentation about SPSS Modeler
Decision Optimization model builder
Use Decision Optimization to build and run optimization models in the Decision Optimization modeler or in a Jupyter notebook.
Required service Decision Optimization
Data formats Tabular: CSV files
Data size Any
How you can prepare data Import relevant data into a scenario and edit it.
How you can build models Build prescriptive decision optimization models. Create, import and edit models in Python DOcplex, OPL or with natural language expressions. Create, import and edit models in notebooks.
How you can solve models Run and solve decision optimization models using CPLEX engines. Investigate and compare solutions for multiple scenarios. Create tables, charts and notes to visualize data and solutions for one or more scenarios.
Get started To create a Decision Optimization model, click Add to project > Decision Optimization, or for notebooks click Add to project > Notebook.
Learn more Documentation about Decision Optimization
AutoAI tool
Use the AutoAI tool to automatically analyze your tabular data and generate candidate model pipelines customized for your predictive modeling problem.
Required service Watson Machine Learning
Data format Tabular: CSV files
Data size Less than 1 GB
How you can prepare data Automatically transform data, such as impute missing values.
How you can build models Train a binary classification, multiclass classification, or regression model. View a tree infographic that shows the sequences of AutoAI training stages. Generate a leaderboard of model pipelines ranked by cross-validation scores. Save a pipeline as a model.
Get started To create an AutoAI experiment, click Add to project > AutoAI experiment.
Learn more Documentation about AutoAI
Experiment builder
Use the Experiment builder to build deep learning experiments and run hundreds of training runs. This method requires that you provide code to define the training run. You run, track, store, and compare the results in the Experiment Builder graphical interface, then save the best configuration as a model.
Required service Watson Machine Learning
Data format Textual: CSV files with labeled textual data Image: Image files in a PKL file. For example, a model testing signatures uses images resized to 32×32 pixels and stored as numpy arrays in a pickled format.
Data size Any size
How you can build models Write Python code to specify metrics for training runs. Write a training definition in Python code. Define hyperparameters, or choose the RBFOpt method or random hyperparameter settings. Find the optimal values for large numbers of hyperparameters by running hundreds or thousands of training runs. Run distributed training with GPUs and specialized, powerful hardware and infrastructure. Compare the performance of training runs. Save a training run as a model.
Get started To create an experiment, click Add to project > Experiment.
Documentation about Experiment builder
Federated Learning
Use the Federated Learning tool to trian a common model using distributed data. The data is never combined or share, preserving data integrity while providing all participating parties with a model based on the aggregated data.
Required service Watson Machine Learning
Data format Any
Data size Any size
How you can build models Choose a training framework. Configure the common model. Configure a file for training the common model. Have remote parties train their data. Deploy the common model.
Get started To create an experiment, click Add to project > Federated Learning experiment.
Learn more Documentation about Federated Learning
Metadata import
Use the metadata import tool to automatically discover and import technical and process metadata for data assets into a project or a catalog.
Required service Watson Knowledge Catalog
Data format Any
Data size Any size
How you can prepare data Import data assets from a connection to a data source.
Get started To import metadata, click Add to project > Metadata import.
Learn more Documentation about metadata import
IBM Match 360 with Watson
Use IBM Match 360 with Watson to create master data entities that represent digital twins of your customers. Model and map your data, then run the matching algorithm to create master data entities. Customize and tune your matching algorithm to meet your organization's requirements.
Required services
IBM Match 360 with Watson
IBM Watson Knowledge Catalog
Data size Any
How you can prepare data Model and map data from sources across your organization. Run the customizable matching algorithm to create master data entities. View and edit master data entities and their associated records.
Get started To create an IBM Match 360 configuration asset, click Add to project > Master data configuration.
Learn more Documentation about IBM Match 360 with Watson
RStudio IDE
Use RStudio IDE to analyze data or create Shiny applications by writing R code. RStudio can be integrated with a Git repository which must be associated with the project.
Required service RStudio
Data format Any
Data size Any size
How you can prepare data, analyze data, and build models Write code in R. Create Shiny apps. Use open source libraries and packages. Include rich text and media with your code. Prepare data. Visualize data. Discover insights from data. Build and train a model using open source libraries. Share your Shiny app in a Git repository.
Get started To use RStudio, click Launch IDE > RStudio.
Learn more Documentation about RStudio
JupyterLab
Use the JupyterLab IDE to create a notebook or Python script in which you run code to prepare, visualize, and analyze data, or build and train a model. JupyterLab is integrated with a Git repository which must be associated with the project.
Data format Any
Data size Any
How you can prepare data, analyze data, or build models
Write code in Python.
Include rich text and media with your code.
Work with any kind of data in any way you want.
Use preinstalled or install other open source and IBM libraries and packages.
Import a notebook from a file.
Share your notebook or script in a Git repository.
Get started To use JupyterLab, click Launch IDE > JupyterLab.
Learn more Documentation about JupyterLab
Masking flows
Use the masking flows tool to prepare masked copies or masked subsets of data from the catalog. Data is de-identified using advanced masking options with data protection rules.
Required service Watson Knowledge Catalog
Data format Relational: Tables in relational data sources
Data size Any size
How you can prepare data, analyze data, or build models Import data assets from governed catalog to project. Create masking flow job definitions to specify what data to mask with data protection rules. Optionally subset data to reduce size of copied data. Run masking flow jobs to load masked copies to target database connections.
Get started Ensure that pre-requisite steps in Watson Knowledge Catalog are completed. To privatize data, do one of the following tasks:
- Click Add to project > Masking flow.
- Click the menu options for individual data assets to mask that asset directly.
Learn more Documentation about masking data
Parent topic: Projects