How To
Summary
You are attempting to deploy the gpt_oss_120b foundation model as a CRM in watsonx.ai. When following the documented steps, you find that the API endpoint is not available to you.
Objective
Purpose of this Technote is to document the proper method of deploying this model in watsonx.ai and ensure the API endpoint is available to you.
Environment
Product: watsonx.ai
Product Version: 2.2
Software Hub / CP4D version : 5.2.2
Environment: Air-gap
Steps
IBM Documentation References:
- Uploading changed configuration (IBM Software Hub 5.2.x)
- Downloading runtime definition (IBM Software Hub 5.2.x)
1. vLLM image download and push to private registry
podman push <registry url>/vllm/vllm-openai:latest2. Create PVD and copy model to PVC
Important: Model files must be directly under /mnt/models in the deploy environment.
Copy model files into the deploy pod (one-by-one):
for f in *; do
oc cp $f model-deploy-pod:/model/$f
echo "copied $f"
doneExpected layout under /mnt/models:
chat_template.jinja model-00004-of-00014.safetensors model-00014-of-00014.safetensors
config.json model-00005-of-00014.safetensors model.safetensors.index.json
encodings model-00006-of-00014.safetensors original
generation_config.json model-00007-of-00014.safetensors README.md
LICENSE model-00008-of-00014.safetensors special_tokens_map.json
metal model-00009-of-00014.safetensors tokenizer_config.json
model-00000-of-00014.safetensors model-00010-of-00014.safetensors tokenizer.json
model-00001-of-00014.safetensors model-00011-of-00014.safetensors USAGE_POLICY
model-00002-of-00014.safetensors model-00012-of-00014.safetensors
model-00003-of-00014.safetensors model-00013-of-00014.safetensorsThen open the UI (e.g. "Show" / deployment UI), select the deployed software spec, and complete deployment there or via curl below.
3. add sub_path in watonsxaiifm cr as following
If the PVC is configured with the following path,
/mnt/models/gpt-oss-120b/`it must be handled as add sub_path in watsonxaiifm CR (e.g. in CR/config):
custom_foundation_models:
- location:
pvc_name: gpt-oss-120b-deploy-pvc
sub_path: gpt-oss-120b
model_id: gpt-oss-120b-deploy4. Custom runtime — build Docker image
Dockerfile example:
FROM <registry url>/vllm/vllm-openai:latest
ENV PORT="3000" \
MODEL_NAME="/mnt/models" \
TRANSFORMERS_OFFLINE="1" \
CUDA_VISIBLE_DEVICES="0" \
VLLM_RPC_TIMEOUT="200000" \
DTYPE="bfloat16" \
MAX_NEW_TOKENS="2047" \
MAX_SEQUENCE_LENGTH="40960" \
HOME="/home/vllm" \
HF_HOME="/home/vllm/.cache/huggingface" \
HF_HUB_CACHE="/home/vllm/.cache/huggingface/hub" \
HF_HUB_OFFLINE="1" \
TIKTOKEN_RS_CACHE_DIR="/mnt/models/"
RUN mkdir -p /home/vllm
USER root:root
WORKDIR /home/vllm
ENTRYPOINT ["bash", "-c", "\
vllm serve /mnt/models/gpt-oss-120b \
--port 3000 \
--dtype bfloat16 \
--max-model-len 32768 \
--gpu-memory-utilization 0.95 \
--trust-remote-code \
--tensor-parallel-size 1 \
--served-model-name $SERVED_MODEL_NAME \
"]Build and push (replace tag/version as needed, e.g. gpt-oss-120b-v10 or qwen3-coder):
podman build -t <registry url>/cfm-vllm-latest:gpt-oss-120b-v10 --format docker -f Dockerfile-gpt .
podman push <registry url>/cfm-vllm-latest:gpt-oss-120b-v105. Software specification (for /v2/software_specifications)
Get token:
TOKEN=$(oc extract secret/wdp-service-id -n ai --keys=service-id-credentials --to=-)Set parameters (one version; example for gpt-oss-120b-v10):
SW_SPEC_NAME="cfm-vllm-latest-gpt-oss-120b-v10"
DISPLAY_NAME="cfm-vllm-latest for gpt-oss-120b-v10"
SW_SPEC_JSON=$(pwd)/${SW_SPEC_NAME}.jsonCreate SW spec JSON:
echo '{
"name": "'"$SW_SPEC_NAME"'",
"description": "'"$DESCRIPTION"'",
"type": "base",
"built_in": false,
"package_extensions": [],
"display_name": "'"$DISPLAY_NAME"'",
"software_configuration": {
"included_packages": [],
"platform": { "name": "python", "version": "3.10" }
}
}' | jq . > $SW_SPEC_JSONCPD base URL (choose one):
export CPD_URL=https://<watsonx url>
# or: export CPD_URL=https://cpd-ai.apps.ocp4.sdis.comCreate software specification:
curl -ksS -X POST $CPD_URL/v2/software_specifications \
-H "Authorization: Basic ${TOKEN}" \
-H "Content-Type: application/json" \
-d @"$SW_SPEC_JSON"List by name:
curl -ksS -X GET "$CPD_URL/v2/software_specifications?name=$SW_SPEC_NAME" \
-H "Authorization: Basic ${TOKEN}" | jq .6. Runtime definition — map image and software spec
List runtime definitions:
curl -k -X GET -H "Authorization: Basic ${TOKEN}" "${CPD_URL}/v2/runtime_definitions" \
| jq -r '.resources[] | "\(.entity.name) -- \(.metadata.guid)"'Use template watsonx-cfm-custom-image-rd.
Download template:
myRuntimeDefinition="watsonx-cfm-custom-image-rd"
curl -ksS -X GET -H "Authorization: Basic ${TOKEN}" \
"${CPD_URL}/v2/runtime_definitions?include=launch_configuration" \
| jq '.resources[] | select(.entity.name=="'${myRuntimeDefinition}'") | .entity' \
> ${myRuntimeDefinition}-server.jsonSet custom image:
CUSTOM_IMAGE=<registry url>/cfm-vllm-latest:gpt-oss-120b-v10Generate new runtime definition JSON:
jq --arg rdname "$SW_SPEC_NAME" \
--arg description "gpt-oss-120b-v10" \
--arg runtime_type "wml" \
--arg sw_spec_name "$SW_SPEC_NAME" \
--arg image "$CUSTOM_IMAGE" \
'.name = $rdname |
.display_name = $rdname |
.description = $description |
.runtime_type = $runtime_type |
.launch_configuration.software_specification_name = $sw_spec_name |
.launch_configuration.image = $image' \
${myRuntimeDefinition}-server.json > $SW_SPEC_NAME-server.jsonCreate runtime definition:
curl -ksS -X POST "$CPD_URL/v2/runtime_definitions" \
-H 'Content-Type: application/json' \
-H "Authorization: Basic ${TOKEN}" \
-d @"$SW_SPEC_NAME-server.json" | jq .Verify: Use GET on $CPD_URL/v2/runtime_definitions and filter by $SW_SPEC_NAME.
Update existing runtime (use the definition's guid):
curl -ksS -X PUT "$CPD_URL/v2/runtime_definitions/<GUID>" \
-H 'Content-Type: application/json' \
-H "Authorization: Basic ${TOKEN}" \
-d @"$SW_SPEC_NAME-server.json" | jq .7. Import assets for Custom foundation model in UI
8. Deploy gpt-oss-120b by curl
Important: update space_id and asset id and add parameters for chat_template as following.
curl -X POST "https://<watsonx url>/ml/v4/deployments?version=2024-01-29" \
-H "Authorization: Bearer ${MY_TOKEN}" \
-H "content-type: application/json" \
--data '{
"asset": { "id": "950f0740-38bf-449e-a3e1-180afb9635ea" },
"online": {
"parameters": {
"serving_name": "gpt-oss-120b",
"foundation_model": { "chat_template": "chat_template.jinja" }
}
},
"hardware_spec": { "name": "WX-S", "num_nodes": 1 },
"description": "gpt-oss-120b deployment using custom foundation model",
"name": "gpt-oss-120b",
"space_id": "1c43fc5c-015b-4e64-9555-d6be6d0c589a"
}'9.Deploy gpt-oss-120b by curl
curl -X GET "https://${CPD_URL}}/ml/v4/deployments/7e5d412e-4828-49c7-99c1-45c100e1bf51?version=2024-01-29&&space_id=1c43fc5c-015b-4e64-9555-d6be6d0c589a" \-H "Authorization: Bearer ${MY_TOKEN}" \-H "content-type: application/json"
Additional Information
Please note of the following points:
- sub_path is required only if model content is not under root. For e.g if model content is <PVC>/<model files> then sub path is not required, however if it is under sub directory path like <PVC>/gptoss/<model_files> then add sub_path : gptoss
- Also make sure model content has a chat_template present, during deployment if this file is not present then chat flag will not be enabled in deployment metadata.
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
10 March 2026
UID
ibm17262878