Setting up storage and uploading a model that is located in your environment

Follow these steps to upload a foundation model that is located in your own environment to PVC storage.

Prerequisites:

Some of the environmental variables that are used in code snippets in this document are global variables that are set at the Cloud Pak for Data installation stage. For more information, see Setting up installation environment variables.

To set up storage and upload a model that is located in your environment:

  1. Set up basic environment variables:

    export MODEL_PATH="Path to the directory where your model is located"
    
  2. Check the model size. If you have Git Large File Storage (git lfs) installed in your environment, navigate to the folder that contains your model and then check the model size:

    cd ${MODEL_PATH}
    git lfs ls-files -s
    

    Example output:

    root@wmlubntu1:~/falcon-7b# git lfs ls-files -s
    1c4b989693 - pytorch_model-00001-of-00002.bin (10 GB)
    11822397cd - pytorch_model-00002-of-00002.bin (4.5 GB)
    

    If you don't have git lfs installed in your environment, you must obtain this information from the model builder or the repository where the model was originally downloaded from. In case of models that are located on Hugging Face, all file sizes are visible when you click on the model name and then select the Files and versions tab.

  3. Calculate the total size of the model and add a 100% buffer to the result. For example, if the model size is 14.5 GB, the size of the PVC that you must create is 29 GB. Set the model size as an environment variable:

    export MODEL_SIZE="<calculated model size>"
    
  4. Check the storage class for your PVC.

  5. Create the PersistentVolumeClaims (PVC) storage for your custom foundation model.

    First, set the name of the new PVC as an environment variable:

    export PVC_NAME="<The name of the new PVC>"
    

    Next, run this code:

    cat <<EOF |oc apply -f -
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: ${PVC_NAME}
      namespace: ${PROJECT_CPD_INST_OPERANDS}
    spec:
      storageClassName: ${STG_CLASS_FILE}
      accessModes:
      - ReadWriteMany
      resources:
        requests:
          storage: ${MODEL_SIZE}Gi
    EOF
    

    To learn more about creating a new PVC or using an existing PVC with storage volumes, see Managing storage volumes.

  6. After creating the PVC, wait for two minutes and then run this command to verify that the PVC is bounded:

    oc get pvc ${PVC_NAME} -n ${PROJECT_CPD_INST_OPERANDS} -o jsonpath='{.status.phase}'
    

    Expected result: Bound

  7. Create a custom standalone job to copy the model content from source location to the PVC path.

    Here is an example of copying a custom model to the PVC. The example assumes that the model is in an IBM COS bucket:

    First, create the COS secret:

    oc create secret generic aws-credentials \
    --from-literal=AWS_ACCESS_KEY_ID=<your access key id> \
    --from-literal=AWS_SECRET_ACCESS_KEY=<your secret access key>
    

    Then, set up the necessary environment variables:

    export ENDPOINT="<Endpoint for the s3 bucket>"
    export BUCKET_NAME="<Name of the bucket where the model is located>"
    export MODEL_PATH_IN_BUCKET="<Model path in bucket>"
    export COPY_JOB_NAME="<Name of the job that copies the model from COS>"
    

    Next, create and run a job that copies models from the IBM COS bucket to your PVC

    cat <<EOF |oc apply -f -
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: ${COPY_JOB_NAME}
    spec:
      template:
        spec:
          containers:
          - name: aws-cli
            image: amazon/aws-cli:latest
            command: ["sh","-c"]
            args:
              - aws --endpoint-url ${ENDPOINT} s3 cp s3://${BUCKET_NAME}/${MODEL_PATH_IN_BUCKET} /model --recursive
            env:
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  name: aws-credentials
                  key: AWS_ACCESS_KEY_ID
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  name: aws-credentials
                  key: AWS_SECRET_ACCESS_KEY
            - name: BUCKET_NAME
              value: ${BUCKET_NAME}
            - name: ENDPOINT
              value: ${ENDPOINT}
            - name: MODEL_PATH
              value: ${MODEL_PATH_IN_BUCKET}
            volumeMounts:
            - name: pvc-mount
              mountPath: /model
          restartPolicy: Never
          volumes:
          - name: pvc-mount
            persistentVolumeClaim:
              claimName: ${PVC_NAME}
    EOF
    
  8. Verify that the job was created:

    oc get job ${COPY_JOB_NAME} -n ${PROJECT_CPD_INST_OPERANDS}
    

    Expected output:

    NAME                   COMPLETIONS   DURATION   AGE
    <job name>   1/1           xx         xx
    
  9. Check the job status:

    oc get job ${COPY_JOB_NAME} -n ${PROJECT_CPD_INST_OPERANDS} -o jsonpath='{.status.conditions[?(@.type=="Complete")].status}'
    

    Expected output: True

  10. Optional: Create and run a job that converts your model to safetensor and fast-tokenizer formats. You can skip this step if you're sure that your model already meets these criteria. The fastest way to check that is to open your model repository and check whether it contains the tokenizer.json and .safetensors files.

    First, set the name and tag of the TGIS digest as environment variables. A digest is a unique identifier for a Docker image. It is computed based on the contents of the image:

    export TGIS_IMAGE_NAME="quay.io/modh/text-generation-inference"
    export TGIS_IMAGE_TAG="rhoai-2.8-58cac74"
    

    Then, set up an environment variable for your job name

    export CONV_JOB_NAME="<Name of the job>"
    

    Next, create and run the converter job:

    cat <<EOF |oc apply -f -
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: ${CONV_JOB_NAME}
      namespace: ${PROJECT_CPD_INST_OPERATORS}
    spec:
      template:
        spec:
          containers:
          - name: models-safetensor
            image: ${TGIS_IMAGE_NAME}:${TGIS_IMAGE_TAG}
            env:
            - name: MODEL_PATH
            value: /model
            command: ["/bin/sh", "-c"]
            args:
            - |
              text-generation-server convert-to-safetensors ${MODEL_PATH}
              text-generation-server convert-to-fast-tokenizer ${MODEL_PATH}
            volumeMounts:
            - mountPath: /model
              name: byom-model
          restartPolicy: Never
          volumes:
          - name: byom-model
            persistentVolumeClaim:
              claimName: ${PVC_NAME}
    EOF
    

    Verify that the job was created:

    oc get job ${CONV_JOB_NAME} -n ${PROJECT_CPD_INST_OPERATORS}
    

    Expected output:

    NAME                   COMPLETIONS   DURATION   AGE
    <job name>   1/1           xx         xx
    

    Finally, check the job status:

    oc get job ${CONV_JOB_NAME} -n ${PROJECT_CPD_INST_OPERATORS} -o jsonpath='{.status.conditions[?(@.type=="Complete")].status}'
    

    Expected output: True

Parent topic: Setting up storage and uploading the model