Registering custom foundation models for global deployment
Prerequisites
- If you want to enable the text chat functionality, the custom foundation model that you want to
deploy must include the chat template as part of the model configuration file
tokenizer_config.json. For example, the model configuration file for theLlama-3.1-8B-Instructmodel includes the chat template, as shown here:
Supported model architectures
You can deploy custom foundation model architectures that are based on the vLLM
runtime at a global level.
The following model architectures are supported for deployment at a global level.
| Model family | Foundation model examples | Supported Quantization method | Parallel Tensors (Multiple GPUs supported) | Deployment configurations |
|---|---|---|---|---|
bloom |
bigscience/bloom-3b, bigscience/bloom-560m |
N/A | Yes | Small, Medium and Large |
exaone |
lgai-exaone/exaone-3.0-7.8B-Instruct |
N/A | No | Small |
falcon |
tiiuae/falcon-7b |
N/A | Yes | Small, Medium and Large |
gemma |
google/gemma-2b |
N/A | Yes | Small, Medium and Large |
gemma2 |
google/gemma-2-9b |
N/A | Yes | Small, Medium and Large |
gpt_bigcode |
bigcode/starcoder, bigcode/gpt_bigcode-santacoder |
gptq |
Yes | Small, Medium and Large |
gpt_neox |
rinna/japanese-gpt-neox-small, EleutherAI/pythia-12b,
databricks/dolly-v2-12b |
N/A | Yes | Small, Medium and Large |
gptj |
EleutherAI/gpt-j-6b |
N/A | No | Small |
gpt2 |
gpt2, gpt2-xl |
N/A | Yes | Small, Medium and Large |
granite |
ibm-granite/granite-3.0-8b-instruct,
ibm-granite/granite-3b-code-instruct-2k, granite-8b-code-instruct,
granite-7b-lab
|
N/A | No | Small |
jais |
core42/jais-13b |
N/A | Yes | Small, Medium and Large |
llama |
meta-llama/Meta-Llama-3-8B,
meta-llama/Meta-Llama-3.1-8B-Instruct, llama-2-13b-chat-hf,
TheBloke/Llama-2-7B-Chat-AWQ,
ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf |
gptq |
Yes | Small, Medium and Large |
mistral |
mistralai/Mistral-7B-v0.3,
neuralmagic/OpenHermes-2.5-Mistral-7B-marlin |
N/A | No | Small |
mixtral |
TheBloke/Mixtral-8x7B-v0.1-GPTQ,
mistralai/Mixtral-8x7B-Instruct-v0.1 |
gptq |
No | Small |
mpt |
mosaicml/mpt-7b, mosaicml/mpt-7b-storywriter,
mosaicml/mpt-30b |
N/A | No | Small |
nemotron |
nvidia/Minitron-8B-Base |
N/A | Yes | Small, Medium and Large |
olmo |
allenai/OLMo-1B-hf, allenai/OLMo-7B-hf |
N/A | Yes | Small, Medium and Large |
persimmon |
adept/persimmon-8b-base, adept/persimmon-8b-chat |
N/A | Yes | Small, Medium and Large |
phi |
microsoft/phi-2, microsoft/phi-1_5 |
N/A | Yes | Small, Medium and Large |
phi3 |
microsoft/Phi-3-mini-4k-instruct |
N/A | Yes | Small, Medium and Large |
qwen |
DeepSeek-R1 (distilled variant) |
N/A | Yes | Small, Medium and Large |
qwen2 |
Qwen/Qwen2-7B-Instruct-AWQ |
AWQ |
Yes | Small, Medium and Large |
| Model family | Foundation model examples | Supported Quantization method | Parallel Tensors (Multiple GPUs supported) | Deployment configurations |
|---|---|---|---|---|
whisper |
openai/whisper-small |
N/A | No | Small, Medium and Large |
whisper |
openai/whisper-large-v3-turbo |
N/A | No | Small, Medium and Large |
| Model family | Foundation model examples | Supported semantic retrieval | Deployment configurations |
|---|---|---|---|
BGE |
BAAI/bge-reranker-v2-m3 |
reranking | Small, Medium and Large |
E5 |
intfloat/multilingual-e5-large |
embedding and reranking | Small, Medium and Large |
granite |
ibm/granite-embedding-107m-multilingual,
ibm/granite-embedding-278m-multilingual |
embedding | Small, Medium and Large |
Jina Reranker |
jinaai/jina-reranker-v2-base-multilingual |
reranking | Small, Medium and Large |
MiniLM |
cross-encoder/ms-marco-minilm-l-12-v2 |
reranking | Small, Medium and Large |
MiniLM |
sentence-transformers/all-minilm-l6-v2 |
embedding and reranking | Small, Medium and Large |
Qwen |
Qwen/Qwen3-Embedding-0.6B |
embedding and reranking | Small, Medium and Large |
slate |
ibm/slate-125m-english-rtrvr,
ibm/slate-125m-english-rtrvr-v2, ibm/slate-30m-english-rtrvr,
ibm/slate-30m-english-rtrvr-v2 |
embedding and reranking | Small, Medium and Large |
Procedure
- Set up the PVC storage and upload the custom foundation model. For more information, see Setting up storage and
uploading the model.
Make a note of the
pvc_namefor the persistent volume claim where you store the downloaded model source files. - Create a ConfigMap file for the custom foundation model by using the
vLLM runtime.
* oc create -f <config_map_file> Poll for the predictor pod to be in running state 1/1 * hermes-2-pro-mistral-7b-predictor-654986d764-mrpt5 1/1 Running 0 25mImportant:- You must specify the model ID in the ConfigMap in lowercase. The model ID cannot be specified in upper case or camel case. The model ID can contain only letters, numbers and underscores.
- You must set
serving_runtimetovllm-serving-runtimeto deploy the global custom foundation model in the Configmap file. - To avoid naming conflicts with foundation models that are shipped by IBM, use a name that is unique. Do not use the same model_id as that of a custom foundation model that is shipped by IBM.
- You must set the
global_custom_foundation_modelparameter totruein the wx-inference-proxy ConfigMap. - To enable the MLOps or Prompt engineers to chat with the globally deployed custom foundation model by using the chat functionality, add the text_chat function in the same ConfigMap file created for the global custom foundation model.
ConfigMap files are used by the Red Hat® OpenShift® AI layer of the service to serve configuration information to independent containers that run in pods or to other system components, such as controllers.
For more information, see Creating a Config Map file. - To register the custom foundation model, apply the ConfigMap file by using
the following
command:
The service operator picks up the configuration information and applies it to your cluster.oc apply -f configmap.yml - You can check the status of the service by using the following command. When
Completedis returned, the custom foundation models are ready for use.oc get watsonxaiifm -n ${PROJECT_CPD_INST_OPERANDS}
Creating a ConfigMap file
| ConfigMap field | Description |
|---|---|
metadata.name |
Model name with hyphens as delimiters. For example, if the model name is
tiiuae/falcon-7b, specify tiiuae-falcon-7b. |
data.model. |
Model name with underscores as delimiters <full_model_name>. For
example, if the model name is tiiuae/falcon-7b, specify
tiiuae_falcon_7b. |
data.model.<full_model_name>.pvc_name |
Persistent volume claim where the model source files are stored. Use the
pvc_name that you noted in an earlier step. For example,
tiiuae-falcon-7b-pvc |
data.model.<full_model_name>.pvc_size |
Size of persistent volume claim where the model source files are stored. For example,
60Gi. |
data.model.<full_model_name>.image |
If the model that you want to use is not yet supported by the standard inference servers, you can override the standard settings and use your own custom inference runtime image. Images that are listed in Open Shift registry were tested and are confirmed to work. You can also try out other images but they are not officially supported by IBM. |
data.model.<full_model_name>.env.DTYPE_STR |
Data type of text strings that the model can process. For example,
float16.For more information about supported values, see Global parameters for custom foundation models. |
data.model.<full_model_name>.env.MAX_NEW_TOKENS |
Maximum number of tokens that the model can generate for a text inference request. For
example, 2047.For more information about supported values, see Global parameters for custom foundation models. |
data.model.<full_model_name>.env.
ENABLE_AUTO_TOOL_CHOICE |
Tells vLLM that you want to enable the model to generate its own tool calls, when appropriate. |
data.model.<full_model_name>.env.
TOOL_CALL_PARSER |
Tool parser to use. For a list of tool parsers that are suitable for various foundation models, see https://docs.vllm.ai/en/stable/features/tool_calling.html#automatic-function-calling. |
data.model.<full_model_name>.annotations.productVersion |
The service operator version. For example, 9.1.0. To get this value,
use the following command: |
data.model.<full_model_name>.annotations.cloudpakInstanceId |
The IBM® Software
Hub instance ID. For
example, b0871d64-ceae-47e9-b186-6e336deaf1f1.To get this value, use the
following command: |
data.model.<full_model_name>.annotations.model-id |
Model's ID. For example, model-id: "meta-llama/llama-3-1-8b" |
data.model.<full_model_name>.labels_syom.icpdsupport/module |
Model name with hyphens as delimiters. For example, if the model name is
tiiuae/falcon-7b, specify tiiuae-falcon-7b |
data.model.<full_model_name>.labels_syom.app |
Model name with hyphens as delimiters and prefixed with text-. For
example, if the model name is tiiuae/falcon-7b, specify
text-tiiuae-falcon-7b. |
data.model.<full_model_name>.labels_syom.syom_model |
Model name with single hyphens as delimiters, except for the first delimiter, which uses
two hyphens. For example, tiiuae--falcon-7b. |
data.model.<full_model_name>.wx_inference_proxy. |
Model ID (<full/model_name>). For example,
tiiuae/falcon-7b |
data.model.<full_model_name>.wx_inference_proxy.<model_id>.label |
Model name without provider prefix. For example, falcon-7b. |
data.model.<full_model_name>.wx_inference_proxy.<model_id>.provider |
Model provider. For example, tiiuae |
data.model.<full_model_name>.wx_inference_proxy.<model_id>.short_description |
Short description of the model in less than 100 characters. |
data.model.<full_model_name>.wx_inference_proxy.<model_id>.long_discription |
Long description of the model. |
data.model.<full_model_name>.wx_inference_proxy.<model_id>.number_params |
Number of model parameters. For example, 7b |
data.model.<full_model_name>.wx_inference_proxy.<model_id>.lifecycle.available.since_version |
The first service operator version in which the model was added. For examples,
9.1.0. |
data.model.<full_model_name>.wx_inference_proxy.<model_id>.functions |
Model function. Available options: text_generation and
text_chat, embedding, rerank,
audio_transcriptions. You must verify the supported functions before editing the
ConfigMap. See the official model card for details.Note:
|
data.model.<full_model_name>.wx_inference_proxy.<model_id>.tags |
Any applicable tags. Must contain this tag: vllm_runtime |
|
The maximum amount of resources that a custom foundation model can use.
|
|
The minimum (guaranteed) amount of each resource that a custom foundation model will get.
|
data.model.<full_model_name>.env.CUDA_VISIBLE_DEVICES |
A comma-separated list of GPU indices that the LLM runtime must use. For example
nvidia.com/gpu count that is configured under
both limits and requests in your Kubernetes spec. |
data.model.<full_model_name>.env.NUM_GPUS |
The total number of GPUs that the the LLM runtime will initialize. This number must be
equal to the count of indices in CUDA_VISIBLE_DEVICES. For example:
nvidia.com/gpu count that is specified
in both limits and requests of your pod’s resource
configuration. |
data.model.<full_model_name>.env.LANGUAGE |
Supported language for this model |
data.model.<full_model_name>.env.MAX_LOG_LEN |
Maximum number of prompt characters or prompt ID numbers to be printed in log |
data.model.<full_model_name>.env.VLLM_CACHE_ROOT |
Root directory for cache files. |
Sample ConfigMap files
Create a ConfigMap file for the custom foundation model by copying one of the
following sample ConfigMaps and then replacing the variables in the template with the appropriate
values for your foundation model.
meta-llama/Meta-Llama-3-8B-Instruct custom foundation
model.apiVersion: v1
kind: ConfigMap
metadata:
name: meta-llama-3-1-8b-instruct
namespace: cpd-instance
labels:
syom: watsonxaiifm_extra_models_config
finalizers:
- watsonxaiifm.cpd.ibm.com/finalizer
data:
model: |
meta_llama_3_1_8b_instruct:
pvc_name: meta-llama-3-1-8b-instruct-pvc
pvc_size: 62Gi
command: []
isvc_yaml_name: isvc.yaml.j2
dir_name: model
force_apply: no
serving_runtime: vllm-serving-runtime
storage_uri: pvc://meta-llama-3-1-8b-instruct-pvc/
env:
- name: MODEL_NAME
value: /mnt/models
- name: CUDA_VISIBLE_DEVICES
value: "0"
- name: TRANSFORMERS_CACHE
value: /mnt/models/
- name: HUGGINGFACE_HUB_CACHE
value: /mnt/models/
- name: DTYPE_STR
value: "float16"
- name: MAX_SEQUENCE_LENGTH
value: "2048"
- name: MAX_BATCH_SIZE
value: "256"
- name: MAX_CONCURRENT_REQUESTS
value: "1024"
- name: MAX_NEW_TOKENS
value: "2048"
- name: FLASH_ATTENTION
value: "true"
- name: DEPLOYMENT_FRAMEWORK
value: "tgis_native"
- name: HF_MODULES_CACHE
value: /tmp/huggingface/modules
- name: SERVED_MODEL_NAME # must match wx_inference_proxy.<model_name>
value: meta-llama/meta-llama-3-1-8b-instruct
- name: NUM_GPUS
value: "1"
- name: PORT
value: "3000"
- name: VLLM_CACHE_ROOT # Additional environment parameters for 5.3.0
value: /tmp/vllm_cache
annotations:
cloudpakId: 5e4c7dd451f14946bc298e18851f3746
cloudpakName: IBM watsonx.ai
productChargedContainers: All
productCloudpakRatio: "1:1"
productID: 3a6d4448ec8342279494bc22e36bc318
productMetric: VIRTUAL_PROCESSOR_CORE
productName: IBM Watsonx.ai
productVersion: 12.0.0
cloudpakInstanceId: cd686e30-7b77-4256-a9be-c25e97f5f838
model-id: "meta-llama/meta-llama-3-1-8b-instruct" # Additional annotation for 5.3.0 - must match wx_inference_proxy.<model_name>
labels_syom:
app.kubernetes.io/managed-by: ibm-cpd-watsonx-ai-ifm-operator
app.kubernetes.io/instance: watsonxaiifm
app.kubernetes.io/name: watsonxaiifm
icpdsupport/addOnId: watsonx_ai_ifm
icpdsupport/app: api
release: watsonxaiifm
icpdsupport/module: meta-llama-3-1-8b-instruct
app: text-meta-llama-3-1-8b-instruct
component: fmaas-inference-server
bam-placement: colocate
syom_model: meta--llama-3-1-8b-instruct
args: []
wx_inference_proxy:
meta-llama/meta-llama-3-1-8b-instruct:
global_custom_foundation_model: true
enabled:
- "true"
label: "meta-llama-3-1-8b-instruct"
provider: "meta-llama"
source: "Hugging Face"
functions:
- text_generation
- text_chat
tags:
- vllm_runtime # Additional tags for 5.3.0
short_description: "A large language model from Meta's LLaMA 3 series, fine-tuned to follow instructions and perform a wide range of natural language understanding and generation tasks."
long_description: "A powerful 8-billion parameter model from Meta's LLaMA 3 series, specifically fine-tuned to enhance instruction-following capabilities, making it effective for a wide range of NLP tasks."
task_ids:
- question_answering
- generation
- summarization
- classification
- extraction
tasks_info:
question_answering:
task_ratings: { quality: 0, cost: 0 }
generation:
task_ratings: { quality: 0, cost: 0 }
summarization:
task_ratings: { quality: 0, cost: 0 }
classification:
task_ratings: { quality: 0, cost: 0 }
extraction:
task_ratings: { quality: 0, cost: 0 }
min_shot_size: 1
tier: "class_2"
number_params: "8b"
lifecycle:
available:
since_version: "9.1.0"
meta_llama_3_1_8b_instruct_resources:
limits:
cpu: "2"
memory: 128Gi
nvidia.com/gpu: "1"
ephemeral-storage: 1Gi
requests:
cpu: "1"
memory: 4Gi
nvidia.com/gpu: "1"
ephemeral-storage: 10Mi
meta_llama_3_1_8b_instruct_replicas: 1Sample ConfigMap for automatic speech recognition (ASR) models:
apiVersion: v1
data:
model: |
whisper_tiny:
pvc_name: <pvc name>
svc_name: whisper-tiny
pvc_size: 100Gi
dir_name: .
force_apply: "no"
isvc_yaml_name: isvc.yaml.j2
image: http://registry.redhat.io/rhoai/odh-vllm-cuda-rhel9@sha256:751e2359439161babb9ad8e93e16251888a8c07aed895ffa55e4dfaf2a45f89d
serving_runtime: vllm-serving-runtime
storage_uri: pvc://<pvc name>/ # keep the trailing dash after the pvc name
annotations:
model-id: "openai/whisper-tiny" # Additional annotation for 5.3.0 - must match wx_inference_proxy.<model_name>
labels_syom:
app.kubernetes.io/managed-by: ibm-cpd-watsonx-ai-ifm-operator
app.kubernetes.io/instance: watsonxaiifm
app.kubernetes.io/name: watsonxaiifm
icpdsupport/addOnId: watsonx_ai_ifm
icpdsupport/app: api
release: watsonxaiifm
icpdsupport/module: whisper-tiny
#app: text-llava-next-video-7b-hf
component: fmaas-inference-server
bam-placement: colocate
syom_model: openai--whisper-tiny # double dash after model provider name
env:
- name: VLLM_CACHE_ROOT
value: /tmp/vllm_cache
- name: MODEL_NAME
value: /mnt/models
- name: SERVED_MODEL_NAME # must match wx_inference_proxy.<model_name>
value: openai/whisper-tiny
- name: LANGUAGE
value: "en"
- name: NUM_GPUS
value: "1"
- name: CUDA_VISIBLE_DEVICES
value: "0"
- name: HUGGINGFACE_HUB_CACHE
value: /mnt/models/
- name: HF_MODULES_CACHE
value: /tmp/huggingface/modules
- name: PORT
value: "3000"
- name: MAX_LOG_LEN
value: "100"
volumeMounts:
- name: home
mountPath: /home/vllm
- name: tmp
mountPath: /tmp
- name: shm
mountPath: /dev/shm
volumes:
- name: home
emptyDir: {}
- name: tmp
emptyDir: {}
- name: shm
emptyDir:
medium: Memory
sizeLimit: 4Gi
wx_inference_proxy:
openai/whisper-tiny:
enabled:
- "true"
label: openai/whisper-tiny
provider: IBM
source: IBM
functions:
- audio_transcriptions
short_description: Whisper is a Transformer based encoder-decoder model, also referred to as a sequence-to-sequence model. It was trained on 680k hours of labelled speech data annotated using large-scale weak supervision.
long_description: Whisper is a Transformer based encoder-decoder model, also referred to as a sequence-to-sequence model. It was trained on 680k hours of labelled speech data annotated using large-scale weak supervision.
tags:
- vllm_runtime
- consumer_public
min_shot_size: 1
input_tier: class_14
output_tier: class_15
number_params: 805k
lifecycle:
available:
since_version: 11.0.0
versions:
- version: 1.0.0
since_version: 11.0.0
whisper_tiny_resources:
limits:
cpu: "3"
memory: 96Gi
nvidia.com/gpu: "1"
ephemeral-storage: 1Gi
requests:
cpu: "2"
memory: 85Gi
nvidia.com/gpu: "1"
ephemeral-storage: 10Mi
whisper_tiny_replicas: 1
kind: ConfigMap
metadata:
finalizers:
- watsonxaiifm.cpd.ibm.com/finalizer
labels:
syom: watsonxaiifm_extra_models_config
name: whisper-tiny
For the functions field, you must verify the supported functions before editing
the ConfigMap. See the official model card for details.
You must set the VLLM_USE_V1 environment variable to 0 in the
Config Map file. Otherwise the inference requests will fail.
apiVersion: v1
data:
model: |
jina_reranker_v2_base_multilingual:
pvc_name: jina-reranker-v2-base-multilingual-pvc
pvc_size: 62Gi
isvc_yaml_name: isvc.yaml.j2
embedding_model: "true"
dir_name: .
force_apply: no
serving_runtime: vllm-serving-runtime
storage_uri: pvc://jina-reranker-v2-base-multilingual-pvc/
env:
- name: MODEL_NAME
value: /mnt/models
- name: CUDA_VISIBLE_DEVICES
value: "0"
- name: TRANSFORMERS_CACHE
value: /mnt/models/
- name: HUGGINGFACE_HUB_CACHE
value: /mnt/models/
- name: DTYPE_STR
value: "float16"
- name: MAX_SEQUENCE_LENGTH
value: "2048"
- name: MAX_NEW_TOKENS
value: "2048"
- name: HF_MODULES_CACHE
value: /tmp/huggingface/modules
- name: SERVED_MODEL_NAME # must match wx_inference_proxy.<model_name>
value: jinaai/jina-reranker-v2-base-multilingual
- name: NUM_GPUS
value: "1"
- name: PORT
value: "3000"
- name: VLLM_CACHE_ROOT
value: /home/vllm
- name: VLLM_USE_V1
value: "0"
volumeMounts:
- name: home
mountPath: /home/vllm
- name: tmp
mountPath: /tmp
- name: shm
mountPath: /dev/shm
volumes:
- name: home
emptyDir: {}
- name: tmp
emptyDir: {}
- name: shm
emptyDir:
medium: Memory
sizeLimit: 4Gi
annotations:
cloudpakId: 5e4c7dd451f14946bc298e18851f3746
cloudpakName: IBM watsonx.ai
productChargedContainers: All
productCloudpakRatio: "1:1"
productID: 3a6d4448ec8342279494bc22e36bc318
productMetric: VIRTUAL_PROCESSOR_CORE
productName: IBM Watsonx.ai
productVersion: 12.0.0
cloudpakInstanceId: cd346dc7-29fd-46db-a874-7e33cb8cf1ac
model-id: jinaai/jina-reranker-v2-base-multilingual #Additional annotation for 5.3.0 - must match wx_inference_proxy.<model_name>
labels_syom:
app.kubernetes.io/managed-by: ibm-cpd-watsonx-ai-ifm-operator
app.kubernetes.io/instance: watsonxaiifm
app.kubernetes.io/name: watsonxaiifm
icpdsupport/addOnId: watsonx_ai_ifm
icpdsupport/app: api
release: watsonxaiifm
icpdsupport/module: jina-reranker-v2-base-multilingual
app: text-jina-reranker-v2-base-multilingual
component: fmaas-inference-server
bam-placement: colocate
syom_model: jina--reranker-v2-base-multilingual
command:
- "/bin/sh"
args:
- "-c"
- "vllm serve /mnt/models --served-model-name jinaai/jina-reranker-v2-base-multilingual --port 3000 --trust-remote-code"
wx_inference_proxy:
jinaai/jina-reranker-v2-base-multilingual:
global_custom_foundation_model: true
enabled:
- "true"
label: "jina-reranker-v2-base-multilingual"
provider: "jinaai"
source: "Hugging Face"
tags:
- vllm_runtime
functions:
- rerank
short_description: "A multilingual transformer-based reranker that scores query-document relevance with high accuracy"
long_description: "The Jina Reranker v2 is a cross-encoder model fine-tuned for multilingual text reranking, capable of handling up to 1024 tokens with a sliding window for longer texts. It delivers state-of-the-art performance across retrieval, code, and text-to-SQL reranking tasks."
tier: "class_c1"
number_params: 278m
lifecycle:
available:
since_version: "10.1.0"
jina_reranker_v2_base_multilingual_resources:
limits:
cpu: "2"
memory: 128Gi
nvidia.com/gpu: "1"
ephemeral-storage: 1Gi
requests:
cpu: "1"
memory: 4Gi
nvidia.com/gpu: "1"
ephemeral-storage: 10Mi
jina_reranker_v2_base_multilingual_replicas: 1
kind: ConfigMap
metadata:
finalizers:
- watsonxaiifm.cpd.ibm.com/finalizer
labels:
syom: watsonxaiifm_extra_models_config
name: jina-reranker-v2-base-multilingual
namespace: cpd-instance
Scaling global deployments
To scale your global deployment, you must edit the ConfigMap file for your deployment:
- Open the ConfigMap file in your default text editor:For example:
oc edit configmap <configmap name> -n <your namespace>oc edit configmap meta-llama-3-1-8b-instruct -n cpd-instance - Edit the
<model_name>_replicasvalue. For example, changemeta_llama_3_1_8b_instruct_replicas: 1tometa_llama_3_1_8b_instruct_replicas: 2 - Trigger
reconciliation:
oc patch watsonxaiifm watsonxaiifm-cr -n <your namespace> --type merge -p '{"spec":{"syom_update_at":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"}}'
What to do next
To test a custom foundation model that is deployed globally from a project or deployment space, submit an inference request to the model. For more information, see Deploying custom foundation models.