Customizing with pip

You can add custom packages from the Python Package Index, PyPI without access to the public network.

You can use these methods to add custom packages:

Prerequisites

  • You must be a Cloud Pak for Data cluster administrator to create a pip configuration file pip.conf.
  • You need Admin or Editor permissions on the project to create an environment template and add a software configuration.
  • For changes to take effect, after you copy pip.conf to the cluster, restart the running runtimes.

Before you begin

Set the CPD_URL and TOKEN environment variables and make sure that the cc-home storage volume exists.

For instructions, see Setting up a storage volume to store customizations for common core services.

You can check if the cc-home storage volume is set up correctly by running this code in a Jupyter Python notebook in Watson Studio:

import os
from ibm_watson_studio_lib import access_project_or_space
wslib = access_project_or_space()
token = wslib.auth.get_current_token()
cpd_url = os.environ["RUNTIME_ENV_APSX_URL"]
%set_env CPD_URL={cpd_url}
%set_env TOKEN={token}
!curl -k ${CPD_URL}/zen-volumes/cc-home/v1/volumes/directories/%2F_global_%2Fconfig%2Fconda -H "Authorization: Bearer ${TOKEN}"

The return message must include "status":"200". If you get an error, check the setup instructions again.

You can set the configuration that is used by pip in the /cc-home/_global_/config/conda/pip.conf global file. These settings apply to all Watson Studio runtimes and all users. The /cc-home/_global_/config/conda/pip.conf file is read-only. The following instructions show how to modify the file by using REST API commands.

After making the modifications, run python3 -m pip config debug to check if the modifications in pip.conf are applied correctly.

An update to pip.conf is activated when a runtime is started. If you want to apply the changes to the global pip.conf file, you must restart the existing runtime.

Using a proxy server or internal package index with pip

You can configure pip for use behind a proxy server by creating your own clusterwide pip configuration file called pip.conf in which you can specify your own package index or a proxy user.

  1. Run the following commands in a notebook to test if the connection is working.

    For a proxy server, run this command:

    !python -m pip install langdetect --proxy https://www.example.com:<port number>
    

    For an internal index, run this command:

    !pip install <some_package> --index-url=http://www.example.com/root/pypi/+simple/ --trusted-host=http://www.example.com
    

    If the connection is not working, resolve any networking or firewall issues before proceeding with the next step.

  2. Retrieve any existing pip.conf files by running:

    curl -vk ${CPD_URL}/zen-volumes/cc-home/v1/volumes/files/%2F_global_%2Fconfig%2Fconda%2Fpip.conf -H "Authorization: ZenApiKey ${MY_TOKEN}" -o pip.conf
    
    Note:

    To run this script, you must generate and export the token as the ${MY_TOKEN} environment variable. For details, see Generating an API authorization token.

  3. Configure a proxy server or an internal repository server.

    To configure a proxy server, set pip.conf as follows:

    [global]
    proxy=https://<user>:<password>@<proxy name>:<port>
    

    To always use an internal repository server, set pip.conf as follows:

    [global]
    index-url=https://www.example.com/root/pypi/+simple/
    trusted-host=https://www.example.com
    
  4. Copy the pip.conf file to the shared file system:

    curl -k -X PUT \
    "${CPD_URL}/zen-volumes/cc-home/v1/volumes/files/%2F_global_%2Fconfig%2Fconda" \
    -H "Authorization: ZenApiKey ${MY_TOKEN}" \
    -H "content-type: multipart/form-data" \
    -F upFile=@pip.conf
    
Note: For the changes to take effect, restart all the running runtimes.

Installing pip packages from IBM Cloud Pak for Data storage

You can add a custom Python distribution package to IBM Cloud Pak for Data storage and then access these packages directly from within a notebook. Alternatively, you can add a configuration to your environment runtime with the file path to the package where it can be picked up by the runtime builds.

  1. Create a Python project with a setup.py build script.

  2. Generate a distribution package.

  3. Upload the zipped archive file to /cc-home/_global_/config/conda.

    curl -k -X PUT \
    "${CPD_URL}/zen-volumes/cc-home/v1/volumes/files/%2F_global_%2Fconfig%2Fconda" \
    -H "Authorization: ZenApiKey ${MY_TOKEN}" \
    -H "content-type: multipart/form-data" \
    -F upFile=@<package_name>
    
    Note:

    To run this script, you must generate and export the token as the ${MY_TOKEN} environment variable. For details, see Generating an API authorization token.

  4. Create an environment template in the project and add a customization for pip:

    # Add conda channels below defaults, indented by two spaces and a hyphen.
    channels:
     - nodefaults
    
    # Add conda packages here, indented by two spaces and a hyphen.
    dependencies:
    
    # Add pip packages here, indented by four spaces and a hyphen.
    # Remove the comments on the following lines and replace sample package name with your package name.
     - pip:
       - file:///cc-home/_global_/config/conda/Archive.zip
    

Installing pip packages from project storage

You can add a custom Python distribution package to your project in Watson Studio and then add a configuration to your environment template with the file path to the data assets folder in the project.

  1. Create a Python project with a setup.py build script.

  2. Generate a distribution package.

  3. Upload the zipped distribution file to your project as a data asset.

  4. Create an environment template in your project and add a customization:

    # Add conda channels below defaults, indented by two spaces and a hyphen.
    channels:
     - nodefaults
    
    # Add conda packages here, indented by two spaces and a hyphen.
    dependencies:
    
    # Add pip packages here, indented by four spaces and a hyphen.
    # Remove the comments on the following lines and replace sample package name with your package name.
     - pip:
       - file:///project_data/data_asset/Archive.zip
    

If your environment is air-gapped and you do not plan to use any conda packages from external sources, create a .condarc file in the same folder with the following content:

offline: True

This ensures that conda filters out all channels that do not use the file:// protocol.

You can also include pip configuration options in a conda environment as pip dependencies. An example is:

dependencies:
  - pip:
    - --index-url https://www.example.com/artifactory/token
    - --trusted-host https://www.example.com
    - langdetect

You can:

  • Have a global pip.conf file
  • Specify pip options in a environment customization
  • Use both methods

Parent topic: Customizing environments