Projects with default Git integration

In a project with default Git integration, you always have your own view of the project based on the contents of your local Git clone. All project assets that are listed in the project reflect the current state of your Git clone.

Since you work in your local Git clone, the same Git repository can be associated with different projects across a single Cloud Pak for Data instance and across multiple Cloud Pak for Data instances.

There is no restriction on the directory structure for code in the Git repository, nor where and how changes are made.

Collaboration

If you want to work with others on the same contents of files in a particular Git repository, you can add those users as collaborators to your project. Those users do not have to create their own projects based on the same Git repository. They can work and test in their own clones of the repository and then merge their changes when their code is ready. By adding collaborators to your project, you can easily track who is working on the same project without having to go to the Git user interface to see who is committing changes.

To enable sharing when working on files, users must be added to the project as collaborators and must have their own access token for the associated repository.

  1. Add users as collaborators to the project and assign them either Admin or Editor role. You can only invite users who have an existing IBM Cloud Pak for Data account. See Adding collaborators.
  2. Give all collaborators the appropriate access permissions to the Git repository.
  3. Collaborators are asked to create and submit their own personal access token when they pull the Git branch for their local clone. See Creating personal access tokens for a Git repository.

Tools and assets that you can use in projects with default Git integration

Note: JupyterLab and RStudio assets are visible only in the IDE, not the assets table.
Tools support
Tool Support in default Git projects Support for project import
AutoAI (Watson Machine Learning) Note: AutoAI does not support pushing to the remote repository.
Data Refinery
Decision Optimization
JupyterLab Note: Use JupyterLab to create and manage notebooks.
RStudio Note: Use RStudio to create and manage notebooks.
SPSS Modeler
Watson Pipelines
Assets support
Asset Support in default Git projects Support for project import
Connections ✓ See Connecting to data sources
Connected data
Connected folder assets
Decision Optimization experiments
Deep Learning experiments
Data assets
Data Refinery flows
Federated Learning experiements
Jobs
Modeler flows
Models from file
Visualizations

You can’t perform any of the following actions in projects with default Git integration:

  • Deploy to space
  • Export project
  • Import assets into non-empty project

If you work in a project with default Git integration, the Git repository might contain assets added from another project that use the same Git repository and you cannot work with all assets pulled from a Git repository. For more information, see Troubleshooting Git integration issues in Watson Studio.

By selecting Local Git data, you can create data assets from any file you pick in your local clone. For example, if you run a notebook that generates a .csv file, you may use this to make it a data asset that you can then refine using Data Refinery.

If you save a Data Refinery flow for example, not only is a .flow file saved that contains the flow itself, but the project creates an asset that points to that flow and that allows you to have metadata for that asset. If you upload a data file, not only is the file uploaded to the project data folder but a Data asset is also created for that file.

Project assets and their metadata are stored in the following well defined locations inside the Git repository:

  • assettypes: contains a set of JSON files that define the types and other characteristics of the assets. The set of files, if any, that exists in this folder depends on the set of services that have been installed.

  • assets: contains any files relevant to the asset as well as a metadata file with the user-specified information (like a description). There is a folder for each type of asset, with a .METADATA folder that contains the JSON files with the metadata.

    For example, for a data asset and a saved model, you would see:

    assets/.METADATA
    assets/.METADATA/wml_model.mymodel1.json
    assets/.METADATA/data_asset.cars.json
    assets/data_asset/cars.csv
    assets/wml_model/mymodel1/7ca4e02d-fe0b-4832-921e-448bf05f435e
    assets/wml_model/mymodel1/3bbb4b08-2d84-4099-8d90-7e9f4fb496f5
    

    You can edit files, for example the metadata JSON files, to update the description of an asset. However you must be cautious when editing these files as the metadata required for each type of asset is different and not documented, which could result in unexpected behaviour if your changes are not valid. You should never manually delete files in these directories. Instead, delete assets only by using the project user interface.

    Note that there is no auto discovery for newly added assets. For example, if you add valid model files to ./wml_model (and don't use the project user interface), the models will not be registered as assets in the project.

    When you push updates to the external Git Repository, always include all files under the directories assettypes and assets including assets/.METADATA. These files are needed to manage project assets consistently for all collaborators in all the Git branches.

Notebooks and scripts

Notebooks and scripts are not project assets in a default Git project and have no associated metadata maintained by Watson Studio. Instead notebooks and scripts are arbitrary code files. There is also no asset versioning inside a default Git project. Version control is done through the versioning inherent in the Git repository.

You develop and test notebooks and scripts in Jupyterlab and RStudio. There is no restriction on the Git directory structure you use, nor on the Git operations you perform.

Additionally, you have full control of the contents of the .gitignore file in your clone for files you don't want to persist in the Git repository. A default .gitignore file is included at the time you create the project that ignores core files and job run information (metadata file and logs, like assets/.METADATA/job_run.* and assets/job_run files). If you want to ignore other files, you should add those files to the default .gitignore file and not use your own .gitignore file.

Note that Python functions are currently not supported in projects with default Git integration.

Jobs for scripts or notebooks

You can create a job from the Jobs page of your project by selecting New job and browsing the script or notebook that you want to use as the entry point for the job.

When the job starts to run, the full contents of your Git clone is available (is mounted), which means that the notebook or script that you selected as the entry point can call any other scripts or notebooks in your clone, which in turn can call other files in the project. See Creating code-based jobs.

Parent topic: Integrating with Git repositories