Working in projects
A project is a collaborative workspace where you work with data and other assets to accomplish a particular goal.
Your project can include these types of resources:
- Collaborators are the people who you work with in your project.
- Data assets are what you work with. Data assets often consist of raw data that you work with to refine.
- Tools and their associated assets are how you work with data.
- Environments are how you configure compute resources for running assets in tools.
- Jobs are how you manage and schedule the running of assets in tools.
- Project documentation and notifications are how you stay informed about what's happening in the project.
- Notification settings are how you set notifications to suit your needs.
- Asset storage is where project information and files are stored.
- Integrations are how you incorporate external tools.
You can customize projects to suit your goals. You can change the contents of your project and almost all of its properties at any time. However, you must make these choices when you create the project because you can't change them later:
- Whether to enable project export and backups to a Git repository.
- Whether to designate the JuypterLab IDE as the notebook editor tool.
You can view projects that you create and collaborate in by selecting Projects > View all projects in the navigation menu, or by viewing the Projects pane on the main page.
Collaboration in projects
As a project creator, you can add other collaborators and assign them roles that control which actions they can take. You automatically have the Admin role in the project, and if you give other collaborators the Admin role, they can add collaborators too. See Adding collaborators and Project collaborator roles.
Collaboration on assets
For all tools in projects except the JupyterLab IDE and RStudio with Git integration, assets are locked during editing to prevent conflicts between changes made by different collaborators.
All collaborators work with the same copy of each asset. Only one collaborator can edit an asset at a time. While a collaborator is editing an asset in a tool, that asset is locked. Other collaborators can view a locked asset, but not edit it. See Managing assets.
Collaboration in the JupyterLab IDE
The JupyterLab IDE uses the version control features of a Git repository instead of locking. When you create a project, you have options to synchronize the project with a Git repository and enable collaborators to use JupyterLab. When you select the JupyterLab option, project collaborators can edit notebooks only in JupyterLab, and the standard Jupyter notebook editor is disabled.
The project shows the contents of the branch that you specified when you created the project. Each collaborator must clone the repository to work on notebooks, scripts, or other files independently and simultaneously. To view or schedule jobs for updated assets in the project, collaborators must push their changes to the project branch, and then pull the updated assets into the project. Users can use the Git functionality in JupyterLab to work with different branches and handle any merge conflicts. They can push their changes to the project branch either directly from JuptyerLab or through Git, for example, by creating a pull request.
See JupyterLab.
Data assets
You can add these types of data assets to projects:
- Data assets from local files
- Connections to cloud and on-premises data sources
- Connected data assets from an existing connection asset that provide read-only access to a table or file in an external data source
- Folder data assets to view the files within a folder in a file system
Learn more about data assets:
Tools and their associated assets
When you run a tool, you create an asset that contains the information for a specific goal. For example, when you run the Data Refinery tool, you create a Data Refinery flow asset that defines the set of ordered operations to run on a specific data asset. Each tool has one or more types of associated assets that run in the tool.
For a mapping of assets to the tools that you use to create them, see Asset types and properties.
The tools that you can use in a project depend on the services that you have.
Environments
Environments control your compute resources. An environment template specifies hardware and software resources to instantiate the environment runtimes that run your assets in tools.
Some tools have an automatically selected environment template. However, for other tools, you can choose between multiple environments. When you create an asset in a tool, you assign an environment to it. You can change the environment for an asset when you run it.
Watson Studio includes a set of default environment templates that vary by coding language, tool, and compute engine type. You can also create custom environment templates or add services that provide environment templates. For example, your administrator can install the Analytics Engine powered by Apache Spark service to provide Spark environments.
See Environments.
Jobs
A job is a single run of an asset in a tool with a specified environment runtime. You can schedule one or repeating jobs, monitor, edit, stop, or cancel jobs. See Jobs.
Asset storage
Each project has a dedicated, secure file storage that contains:
- Data assets that you upload to the project as files.
- Data assets from files that you copy from another workspace.
- Files that you save to the project with a tool.
- Files for assets that run in tools, such as notebooks.
- Saved models.
The initial storage limitation of assets is 100 GB across all workspaces.
To store large amounts of data, you can use databases, storage volumes, or object stores instead and connect to your data sources.
If you need to increase the size of your storage, complete the following steps. See Resizing Persistent Volumes using Kubernetes for more details.
- Confirm if the storage provisioner allows files to be resized.
- Edit the persistent volume claim and update the size value.
When you delete a project, the files that are associated with the project are also deleted.
Additional services
Cloud Pak for Data administrators can install more services to add tools or compute environments.
Integrations with external tools
Integrations provide a method to interact with tools that are external to the project.
You can integrate with a Git repository to export the project, to work with documents and notebooks in JupyterLab, or to back up the project for source code management purposes.
Project documentation and notifications
While you create a project, you can add a short description to document the purpose or goal of the project. You can edit the description later, on the project's Settings page.
You can mark the project as sensitive. When users open a project that is marked as sensitive, a notification is displayed stating that no data assets can be downloaded or exported from the project.
You can select to log all project activities. Logging all project activities, tracks detailed project activity and creates a full activities log, which can be downloaded to view.
You can change these settings at any time for the project by changing the state of the toggle button on the Settings page.
You can view recent asset activity in the Assets pane on the Overview page, and filter the assets by selecting By you or By all using the dropdown. By you lists assets that you edited, ordered by most recent. By all lists assets that are edited by others and also by you, ordered by most recent.
All collaborators in a project are notified when a collaborator changes an asset.
Notification settings
To see your notification settings, click the notification bell icon and then click the settings icon.
You can change your notification settings in the following ways:
- Specify to receive push notifications that appear briefly on screen. If you select Do not disturb, you continue to see notifications on the home page and the number of notifications on the bell.
- Specify to receive notifications by email.
- Specify for which projects or spaces you receive notifications.