Projects with default Git integration
In a project with default Git integration, you always have your own view of the project based on the contents of your local Git clone. All project assets that are listed in the project reflect the current state of your Git clone.
Since you work in your local Git clone, the same Git repository can be associated with different projects across a single Cloud Pak for Data instance and across multiple Cloud Pak for Data instances.
There is no restriction on the directory structure for code in the Git repository, nor where and how changes are made.
Collaboration
If you want to work with others on the same contents of files in a particular Git repository, you can add those users as collaborators to your project. Those users do not have to create their own projects based on the same Git repository. They can work and test in their own clones of the repository and then merge their changes when their code is ready. By adding collaborators to your project, you can easily track who is working on the same project without having to go to the Git user interface to see who is committing changes.
To enable sharing when working on files, users must be added to the project as collaborators and must have their own access token for the associated repository.
- Add users as collaborators to the project and assign them either Admin or Editor role. You can only invite users who have an existing IBM Cloud Pak for Data account. See Adding collaborators.
- Give all collaborators the appropriate access permissions to the Git repository.
- Collaborators are asked to create and submit their own personal access token when they pull the Git branch for their local clone. See Creating personal access tokens for a Git repository.
Tools and assets that you can use in projects with default Git integration
Tool | Support in default Git projects | Support for project import |
---|---|---|
AutoAI (Watson Machine Learning) | ✓ Note: AutoAI does not support pushing to the remote repository. | |
Data Refinery | ✓ | ✓ |
Decision Optimization | ✓ | |
JupyterLab | ✓ Note: Use JupyterLab to create and manage notebooks. | |
RStudio | ✓ Note: Use RStudio to create and manage notebooks. | |
SPSS Modeler | ✓ | ✓ |
Watson Pipelines |
Asset | Support in default Git projects | Support for project import |
---|---|---|
Connections | ✓ See Connecting to data sources | ✓ |
Connected data | ✓ | |
Connected folder assets | ||
Decision Optimization experiments | ✓ | |
Deep Learning experiments | ✓ | |
Data assets | ✓ | ✓ |
Data Refinery flows | ✓ | |
Federated Learning experiements | ✓ | |
Jobs | ✓ | ✓ |
Modeler flows | ✓ | ✓ |
Models from file | ✓ | |
Visualizations | ✓ | ✓ |
You can’t perform any of the following actions in projects with default Git integration:
- Deploy to space
- Export project
- Import assets into non-empty project
If you work in a project with default Git integration, the Git repository might contain assets added from another project that use the same Git repository and you cannot work with all assets pulled from a Git repository. For more information, see Troubleshooting Git integration issues in Watson Studio.
By selecting Local Git data, you can create data assets from any file you pick in your local clone. For example, if you run a notebook that generates a .csv
file, you may use this to make it a data asset that you can then
refine using Data Refinery.
If you save a Data Refinery flow for example, not only is a .flow
file saved that contains the flow itself, but the project creates an asset that points to that flow and that allows you to have metadata for that asset. If you upload
a data file, not only is the file uploaded to the project data folder but a Data asset is also created for that file.
Project assets and their metadata are stored in the following well defined locations inside the Git repository:
-
assettypes
: contains a set of JSON files that define the types and other characteristics of the assets. The set of files, if any, that exists in this folder depends on the set of services that have been installed. -
assets
: contains any files relevant to the asset as well as a metadata file with the user-specified information (like a description). There is a folder for each type of asset, with a.METADATA
folder that contains the JSON files with the metadata.For example, for a data asset and a saved model, you would see:
assets/.METADATA assets/.METADATA/wml_model.mymodel1.json assets/.METADATA/data_asset.cars.json assets/data_asset/cars.csv assets/wml_model/mymodel1/7ca4e02d-fe0b-4832-921e-448bf05f435e assets/wml_model/mymodel1/3bbb4b08-2d84-4099-8d90-7e9f4fb496f5
You can edit files, for example the metadata JSON files, to update the description of an asset. However you must be cautious when editing these files as the metadata required for each type of asset is different and not documented, which could result in unexpected behaviour if your changes are not valid. You should never manually delete files in these directories. Instead, delete assets only by using the project user interface.
Note that there is no auto discovery for newly added assets. For example, if you add valid model files to
./wml_model
(and don't use the project user interface), the models will not be registered as assets in the project.When you push updates to the external Git Repository, always include all files under the directories
assettypes
andassets
includingassets/.METADATA
. These files are needed to manage project assets consistently for all collaborators in all the Git branches.
Notebooks and scripts
Notebooks and scripts are not project assets in a default Git project and have no associated metadata maintained by Watson Studio. Instead notebooks and scripts are arbitrary code files. There is also no asset versioning inside a default Git project. Version control is done through the versioning inherent in the Git repository.
You develop and test notebooks and scripts in Jupyterlab and RStudio. There is no restriction on the Git directory structure you use, nor on the Git operations you perform.
Additionally, you have full control of the contents of the .gitignore
file in your clone for files you don't want to persist in the Git repository. A default .gitignore
file is included at the time you create the project
that ignores core files and job run information (metadata file and logs, like assets/.METADATA/job_run.*
and assets/job_run
files). If you want to ignore other files, you should add those files to the default .gitignore
file and not use your own .gitignore
file.
Note that Python functions are currently not supported in projects with default Git integration.
Jobs for scripts or notebooks
You can create a job from the Jobs page of your project by selecting New job and browsing the script or notebook that you want to use as the entry point for the job.
When the job starts to run, the full contents of your Git clone is available (is mounted), which means that the notebook or script that you selected as the entry point can call any other scripts or notebooks in your clone, which in turn can call other files in the project. See Creating code-based jobs.
Parent topic: Integrating with Git repositories