Supported use cases
The available use cases and model names that are supported with FP16 datatype are listed in Table 1.
| Use case | Model name | Batch size | Maximum input context size | Maximum output context size | Number of cards per container |
|---|---|---|---|---|---|
| Entity extraction |
Granite3.3-8b-instruct |
16 | 3K | 3K | 1 |
| RAG embedding |
Granite-Embedding-125m-English Granite-Embedding 278m-multilingual |
Up to 256 | 512 | Vector of size 768 | 1 |
| RAG embedding |
Granite-Embedding-30m-English Granite-embedding-107m-multilingual |
Up to 256 | 512 | Vector of size 384 | 1 |
| RAG inferencing |
Granite3.3-8b-instruct |
32 | 32K (batch * context equals less than 128K) | 32K (batch * context equals less than 128K) | 4 |
| Reranker |
bge-reranker-v2-m3 |
Up to 4 | 8K | 8K | 1 |