IBM Support

How to deploy gpt_oss_120b on watsonx.ai as a Custom Foundation Model (CFM)?

How To


Summary

You are attempting to deploy the gpt_oss_120b foundation model as a CRM in watsonx.ai. When following the documented steps, you find that the API endpoint is not available to you.

Objective

Purpose of this Technote is to document the proper method of deploying this model in watsonx.ai and ensure the API endpoint is available to you.

Environment

Product: watsonx.ai

Product Version: 2.2

Software Hub / CP4D version : 5.2.2

Environment: Air-gap

Steps

IBM Documentation References:

 

1. vLLM image download and push to private registry

podman push <registry url>/vllm/vllm-openai:latest

 

2. Create PVD and copy model to PVC

Important: Model files must be directly under /mnt/models in the deploy environment.

Copy model files into the deploy pod (one-by-one):

for f in *; do
  oc cp $f model-deploy-pod:/model/$f
  echo "copied $f"
done

Expected layout under /mnt/models:

chat_template.jinja               model-00004-of-00014.safetensors  model-00014-of-00014.safetensors
config.json                       model-00005-of-00014.safetensors  model.safetensors.index.json
encodings                         model-00006-of-00014.safetensors  original
generation_config.json            model-00007-of-00014.safetensors  README.md
LICENSE                           model-00008-of-00014.safetensors  special_tokens_map.json
metal                             model-00009-of-00014.safetensors  tokenizer_config.json
model-00000-of-00014.safetensors  model-00010-of-00014.safetensors  tokenizer.json
model-00001-of-00014.safetensors  model-00011-of-00014.safetensors  USAGE_POLICY
model-00002-of-00014.safetensors  model-00012-of-00014.safetensors
model-00003-of-00014.safetensors  model-00013-of-00014.safetensors

Then open the UI (e.g. "Show" / deployment UI), select the deployed software spec, and complete deployment there or via curl below.

 

3. add sub_path in watonsxaiifm cr as following

If the PVC is configured with the following path,

/mnt/models/gpt-oss-120b/`

it must be handled as add sub_path in watsonxaiifm CR (e.g. in CR/config):

custom_foundation_models:
  - location:
      pvc_name: gpt-oss-120b-deploy-pvc
      sub_path: gpt-oss-120b
    model_id: gpt-oss-120b-deploy

 

4. Custom runtime — build Docker image

Dockerfile example:

FROM <registry url>/vllm/vllm-openai:latest

ENV PORT="3000" \
    MODEL_NAME="/mnt/models" \
    TRANSFORMERS_OFFLINE="1" \
    CUDA_VISIBLE_DEVICES="0" \
    VLLM_RPC_TIMEOUT="200000" \
    DTYPE="bfloat16" \
    MAX_NEW_TOKENS="2047" \
    MAX_SEQUENCE_LENGTH="40960" \
    HOME="/home/vllm" \
    HF_HOME="/home/vllm/.cache/huggingface" \
    HF_HUB_CACHE="/home/vllm/.cache/huggingface/hub" \
    HF_HUB_OFFLINE="1" \
    TIKTOKEN_RS_CACHE_DIR="/mnt/models/"

RUN mkdir -p /home/vllm
USER root:root
WORKDIR /home/vllm
ENTRYPOINT ["bash", "-c", "\
vllm serve /mnt/models/gpt-oss-120b \
  --port 3000 \
  --dtype bfloat16 \
  --max-model-len 32768 \
  --gpu-memory-utilization 0.95 \
  --trust-remote-code \
  --tensor-parallel-size 1 \
  --served-model-name $SERVED_MODEL_NAME \
"]

Build and push (replace tag/version as needed, e.g. gpt-oss-120b-v10 or qwen3-coder):

podman build -t <registry url>/cfm-vllm-latest:gpt-oss-120b-v10 --format docker -f Dockerfile-gpt .
podman push <registry url>/cfm-vllm-latest:gpt-oss-120b-v10

 

5. Software specification (for /v2/software_specifications)

Get token:

TOKEN=$(oc extract secret/wdp-service-id -n ai --keys=service-id-credentials --to=-)

Set parameters (one version; example for gpt-oss-120b-v10):

SW_SPEC_NAME="cfm-vllm-latest-gpt-oss-120b-v10"
DISPLAY_NAME="cfm-vllm-latest for gpt-oss-120b-v10"
SW_SPEC_JSON=$(pwd)/${SW_SPEC_NAME}.json

Create SW spec JSON:

echo '{
  "name": "'"$SW_SPEC_NAME"'",
  "description": "'"$DESCRIPTION"'",
  "type": "base",
  "built_in": false,
  "package_extensions": [],
  "display_name": "'"$DISPLAY_NAME"'",
  "software_configuration": {
    "included_packages": [],
    "platform": { "name": "python", "version": "3.10" }
  }
}' | jq . > $SW_SPEC_JSON

CPD base URL (choose one):

export CPD_URL=https://<watsonx url>
# or: export CPD_URL=https://cpd-ai.apps.ocp4.sdis.com

Create software specification:

curl -ksS -X POST $CPD_URL/v2/software_specifications \
  -H "Authorization: Basic ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d @"$SW_SPEC_JSON"

List by name:

curl -ksS -X GET "$CPD_URL/v2/software_specifications?name=$SW_SPEC_NAME" \
  -H "Authorization: Basic ${TOKEN}" | jq .

 

6. Runtime definition — map image and software spec

List runtime definitions:

curl -k -X GET -H "Authorization: Basic ${TOKEN}" "${CPD_URL}/v2/runtime_definitions" \
  | jq -r '.resources[] | "\(.entity.name) -- \(.metadata.guid)"'

Use template watsonx-cfm-custom-image-rd.

Download template:

myRuntimeDefinition="watsonx-cfm-custom-image-rd"
curl -ksS -X GET -H "Authorization: Basic ${TOKEN}" \
  "${CPD_URL}/v2/runtime_definitions?include=launch_configuration" \
  | jq '.resources[] | select(.entity.name=="'${myRuntimeDefinition}'") | .entity' \
  > ${myRuntimeDefinition}-server.json

Set custom image:

CUSTOM_IMAGE=<registry url>/cfm-vllm-latest:gpt-oss-120b-v10

Generate new runtime definition JSON:

jq --arg rdname "$SW_SPEC_NAME" \
   --arg description "gpt-oss-120b-v10" \
   --arg runtime_type "wml" \
   --arg sw_spec_name "$SW_SPEC_NAME" \
   --arg image "$CUSTOM_IMAGE" \
   '.name = $rdname |
    .display_name = $rdname |
    .description = $description |
    .runtime_type = $runtime_type |
    .launch_configuration.software_specification_name = $sw_spec_name |
    .launch_configuration.image = $image' \
   ${myRuntimeDefinition}-server.json > $SW_SPEC_NAME-server.json

Create runtime definition:

curl -ksS -X POST "$CPD_URL/v2/runtime_definitions" \
  -H 'Content-Type: application/json' \
  -H "Authorization: Basic ${TOKEN}" \
  -d @"$SW_SPEC_NAME-server.json" | jq .

Verify: Use GET on $CPD_URL/v2/runtime_definitions and filter by $SW_SPEC_NAME.

Update existing runtime (use the definition's guid):

curl -ksS -X PUT "$CPD_URL/v2/runtime_definitions/<GUID>" \
  -H 'Content-Type: application/json' \
  -H "Authorization: Basic ${TOKEN}" \
  -d @"$SW_SPEC_NAME-server.json" | jq .

 

7. Import assets for Custom foundation model in UI

 

8. Deploy gpt-oss-120b by curl

Important: update space_id and asset id and add parameters for chat_template as following.

curl -X POST "https://<watsonx url>/ml/v4/deployments?version=2024-01-29" \
  -H "Authorization: Bearer ${MY_TOKEN}" \
  -H "content-type: application/json" \
  --data '{
  "asset": { "id": "950f0740-38bf-449e-a3e1-180afb9635ea" },
  "online": {
    "parameters": {
      "serving_name": "gpt-oss-120b",
      "foundation_model": { "chat_template": "chat_template.jinja" }
    }
  },
  "hardware_spec": { "name": "WX-S", "num_nodes": 1 },
  "description": "gpt-oss-120b deployment using custom foundation model",
  "name": "gpt-oss-120b",
  "space_id": "1c43fc5c-015b-4e64-9555-d6be6d0c589a"
}'

 

9.Deploy gpt-oss-120b by curl

curl -X GET "https://${CPD_URL}}/ml/v4/deployments/7e5d412e-4828-49c7-99c1-45c100e1bf51?version=2024-01-29&&space_id=1c43fc5c-015b-4e64-9555-d6be6d0c589a" \-H "Authorization: Bearer ${MY_TOKEN}" \-H "content-type: application/json"

Additional Information

Please note of the following points:

  • sub_path is required only if model content is not under root. For e.g if model content is <PVC>/<model files> then sub path is not required, however if it is under sub directory path like <PVC>/gptoss/<model_files> then add sub_path : gptoss
  • Also make sure model content has a chat_template present, during deployment if this file is not present then chat flag will not be enabled in deployment metadata.

 

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB76","label":"Data Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSEJJD","label":"IBM watsonx.ai"},"ARM Category":[{"code":"a8mKe0000008OieIAE","label":"wx Models"}],"ARM Case Number":"TS021102215","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"2.2.0;2.2.1"}]

Document Information

Modified date:
10 March 2026

UID

ibm17262878