Supported use cases

The available use cases and model names that are supported with FP16 datatype are listed in Table 1.

Table 1. Supported use cases
Use case Model name Batch size Maximum input context size Maximum output context size Number of cards per container
Entity extraction

Granite3.3-8b-instruct

16 3K 3K 1
RAG embedding

Granite-Embedding-125m-English

Granite-Embedding 278m-multilingual

Up to 256 512 Vector of size 768 1
RAG embedding

Granite-Embedding-30m-English

Granite-embedding-107m-multilingual

Up to 256 512 Vector of size 384 1
RAG inferencing

Granite3.3-8b-instruct

32 32K (batch * context equals less than 128K) 32K (batch * context equals less than 128K) 4
Reranker

bge-reranker-v2-m3

Up to 4 8K 8K 1