Working in projects (Watson Studio and Watson Knowledge Catalog)
A project is a collaborative workspace where you work with data and other assets to accomplish a particular goal. If you have the Watson Studio service, you can prepare data, analyze data, and build models. The tools you have depend on which of the additional services that supplement Watson Studio, such as Watson Machine Learning, are installed.
If you don't have the Watson Studio service, you can prepare data with the Watson Knowledge Catalog service or create dashboards with the Cognos Dashboards service.
Your project can include these types of resources:
- Collaborators are the people who you work with in your project.
- Data assets are what you work with. Data assets often consist of raw data that you work with to refine.
- Tools and their associated assets are how you work with data.
- Environments are how you configure compute resources for running assets.
- Jobs are how you manage and schedule the running of assets.
- Project documentation and notifications are how you stay informed about what's happening in the project.
- Asset storage is where project information and files are stored.
- Integrations are how you incorporate external tools.
- Services are how you add tools or processing power to your project.
- Catalogs are how you share assets between projects.
You can customize projects to suit your goals. You can change the contents of your project and almost all of its properties at any time. However, you must make these choices when you create the project because you can't change them later:
- Whether to enable project export and backups to a Git repository.
- Whether to designate the JuypterLab IDE as the notebook editor tool.
Collaboration in projects
As a project creator, you can add other collaborators and assign them roles that control which actions they can take. You automatically have the Admin role in the project, and if you give other collaborators the Admin role, they can add collaborators too. See Adding collaborators and Project collaborator roles.
Collaboration on assets
For all tools in projects except the JupyterLab IDE and RStudio with Git integration, assets are locked during editing to prevent conflicts between changes made by different collaborators.
All collaborators work with the same copy of each asset. Only one collaborator can edit an asset at a time. While a collaborator is editing an asset in a tool, that asset is locked. Other collaborators can view a locked asset, but not edit it. See Managing assets.
Collaboration in the JupyterLab IDE
The JupyterLab IDE uses the version control features of a Git repository instead of locking. When you create a project, you have options to synchronize the project with a Git repository and enable collaborators to use JupyterLab. When you select the JupyterLab option, project collaborators can edit notebooks only in JupyterLab and the standard Jupyter notebook editor is disabled.
The project shows the contents of the branch that you specified when you created the project. Each collaborator must clone the repository to work on notebooks, scripts, or other files independently and simultaneously. To view or schedule jobs for updated assets in the project, collaborators must push their changes to the project branch and then pull the updated assets into the project. Users can use the Git functionality in JupyterLab to work with different branches and handle any merge conflicts. They can push their changes to the project branch either directly from JuptyerLab or through Git, for example, by creating a pull request.
See JupyterLab.
Data assets
You can add these types of data assets to projects:
- Data assets from local files or catalogs
- Connections to cloud and on-premises data sources. See Connectors.
- Connected data assets from an existing connection asset that provide read-only access to a table or file in an external data source
- Imported data assets from an existing connection asset that provide read-only access to a table or a file in an external data source
- Folder data assets to view the files within a folder in a file system
Learn more about data assets:
Tools and their associated assets
When you run a tool, you create an asset that contains the information for a specific goal. For example, when you run the Data Refinery tool, you create a Data Refinery flow asset that defines the set of ordered operations to run on a specific data asset. Each tool has one or more types of associated assets that run in the tool. Some types of assets can run in more than one tool, for example, notebook assets. Assets that run in tools are also known as operational assets.
For a mapping of assets to the tools that you use to create them, see Assets in Cloud Pak for Data.
The tools that you can use in a project depend on the services that you have.
With Watson Studio, you can create these types of assets that run in tools without additional services:
- Data Refinery flows to refine data with the Data Refinery tool.
- Jupyter notebooks to analyze data or build models. By default, you edit notebooks in the Jupyter notebook editor. However, when you create a project, you can choose to edit notebooks in the JupyterLab IDE instead.
- Python scripts to develop interactive, exploratory analytics scripts with Python in the JupyterLab IDE.
With Watson Studio, some assets that run in tools require extra services. If your administrator installed the services, you can add these assets:
- SPSS Modeler flows to automate the flow of data through a model with SPSS algorithms in the SPSS Modeler. Requires the SPSS Modeler service.
- AutoAI experiments to build a model without coding in the AutoAI tool. Requires the Watson Machine Learning service.
- Deep learning experiments to train deep learning models in the Experiment builder. Requires the Watson Machine Learning service and integration with Watson Machine Learning Accelerator.
- Decision Optimization models to solve scenarios in the Decision Optimization model builder. Requires the Decision Optimization and the Watson Machine Learning services.
- R Shiny apps to develop interactive web applications. Requires the RStudio Server with R 3.6 service.
- Dashboards to visualize data without code in the Dashboard editor. Requires the Cognos Dashboards service.
- AutoAI experiments to build a model without coding in the AutoAI tool. Requires the Watson Machine Learning service.
- Data quality rules to perform data quality analysis. The data quality feature must be enabled. Requires the DataStage service.
If you do not have Watson Studio, the operational assets you can create depend on the service:
- With the Cognos Dashboard service, you can add dashboards to visualize data without code in the Dashboard editor.
- With the Watson Knowledge Catalog service, you can add Data Refinery flows, metadata imports, metadata enrichments, and data quality rules.
- With the DataStage service, you can add DataStage flows.
To determine which tool you need, see Choosing a tool.
Environments
Environments control your compute resources. An environment template specifies hardware and software resources to instantiate the environment runtimes that run your operational assets in tools.
Some types of operational assets have an automatically selected environment template. However, for some types of operational assets, you can choose between multiple environments when you create an asset and when you run it. Watson Studio includes a set of default environment templates that vary by coding language, tool, and compute engine type. You can also create custom environment templates or add services that provide environment templates. For example, your administrator can install the IBM Analytics Engine powered by Apache Spark service to provide Spark environments.
See Environments.
Jobs
A job is a single run of an operational asset with a specified environment runtime. You can schedule one or repeating jobs, monitor, edit, stop, or cancel jobs. See Jobs.
Asset storage
Each project, space, or catalog has a dedicated, secure file storage that contains:
- Data assets that you upload to the project, space, or catalog as files.
- Data assets from files that you copy from a catalog.
- Files that you save to the project, space, or catalog with a tool.
- Files for operational assets, such as notebooks.
- Saved models
The initial storage limitation of assets is 100 GB across all projects, spaces, and catalogs.
If you need to increase the size of your storage, complete the following steps. See Resizing Persistent Volumes using Kubernetes for more details.
- Confirm if the storage provisioner allows files to be resized.
- Edit the persistent volume claim and update the size value.
When you delete a project, space, or catalog the files associated with the project, space, or catalog are also deleted.
Additional services
Cloud Pak for Data administrators can install more services to add tools or compute environments to Watson Studio.
Integrations with external tools
Integrations provide a method to interact with tools that are external to the project.
You can integrate with a Git repository to export the project, to work with documents and notebooks in JupyterLab, or to back up the project for source code management purposes.
Project documentation and notifications
While you create a project, you can add a short description to document the purpose or goal of the project. You can edit the description later, on the project's Settings page.
You can mark the project as sensitive. When users open a project that is marked as sensitive, a notification is displayed stating that no data assets can be downloaded or exported from the project.
You can select to log all project activities. Logging all project activities, tracks detailed project activity and creates a full activities log, which can be downloaded to view.
You can change these settings at any time for the project by changing the state of the toggle button on the Settings page.
All collaborators in a project are notified when a collaborator changes an asset.
Catalog integration
A catalog is a central repository for assets where you can easily find and share data and other assets. Before you can access a catalog, a catalog administrator must add you as a catalog collaborator. A catalog has the same type of roles as a project. With any catalog role, you can copy assets from the catalog into a project to use them. With the Editor or Admin role in the catalog, you can create assets in a project and then publish them into the catalog.