Skip to main content

Granite 4.1

Dense language model family in 3B, 8B, and 30B sizes with instruction-tuned variants, delivering improved tool calling, instruction following, coding, and mathematical reasoning.

Overview

Granite 4.1 is a family of dense language models available in three sizes: 3B, 8B, and 30B parameters. Each size is available in both base and instruction-tuned variants, with optional FP8 quantization for efficient deployment. Built with a dense architecture, Granite 4.1 demonstrates significant improvements over Granite 4.0 in tool calling, instruction following, coding capabilities, and mathematical reasoning.

Model Variants

  • granite-4.1-3b-base & granite-4.1-3b-instruct: Compact model optimized for edge deployment and resource-constrained environments
  • granite-4.1-8b-base & granite-4.1-8b-instruct: Balanced model for general-purpose enterprise applications
  • granite-4.1-30b-base & granite-4.1-30b-instruct: High-capacity model for complex reasoning and specialized tasks

All models are released under the Apache 2.0 license with cryptographic signatures, ISO certification, and full transparency disclosures.

Key Capabilities

Tool Calling: Granite 4.1 demonstrates strong ability to understand and execute tool-based instructions, enabling seamless integration with various software tools and APIs. This capability allows enterprises to create powerful AI-driven workflows and automate complex tasks.

Instruction Following: Granite 4.1 exhibits improved comprehension and adherence to user instructions, ensuring reliable and accurate task completion for enterprise automation.

Code Generation & Explanation: Granite 4.1 generates code snippets and explains complex codebases across multiple programming languages with higher accuracy, accelerating software development workflows.

Mathematical Reasoning: Granite 4.1 tackles complex mathematical problems from basic arithmetic to advanced calculus and linear algebra, enabling automated calculation and decision-making.

Getting Started

First, install the required libraries:

pip install torch torchvision torchaudio
pip install accelerate
pip install transformers

Generation

This is a simple example of how to use Granite-4.1-30B model:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"
model_path = "ibm-granite/granite-4.1-30b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()

# change input text as desired
chat = [
{ "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

# tokenize the text
input_tokens = tokenizer(chat, return_tensors="pt").to(device)

# generate output tokens
output = model.generate(**input_tokens,
max_new_tokens=100)

# decode output tokens into text
output = tokenizer.batch_decode(output)

# print output
print(output[0])

Expected output:

<|start_of_role|>user<|end_of_role|>Please list one IBM Research laboratory located in the United States. You should only output its name and location.<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>IBM Research - Almaden, San Jose, California<|end_of_text|>

Tool Calling

Granite-4.1-30B comes with enhanced tool calling capabilities, enabling seamless integration with external functions and APIs. Define a list of tools using OpenAI's function definition schema:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"
model_path = "ibm-granite/granite-4.1-30b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()

tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather for a specified city.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "Name of the city"
}
},
"required": ["city"]
}
}
}
]

# change input text as desired
chat = [
{ "role": "user", "content": "What's the weather like in Boston right now?" },
]
chat = tokenizer.apply_chat_template(chat, \
tokenize=False, \
tools=tools, \
add_generation_prompt=True)

# tokenize the text
input_tokens = tokenizer(chat, return_tensors="pt").to(device)

# generate output tokens
output = model.generate(**input_tokens,
max_new_tokens=100)

# decode output tokens into text
output = tokenizer.batch_decode(output)

# print output
print(output[0])

Expected output:

<|start_of_role|>system<|end_of_role|>You are a helpful assistant with access to the following tools. You may call one or more tools to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{"type": "function", "function": {"name": "get_current_weather", "description": "Get the current weather for a specified city.", "parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "Name of the city"}}, "required": ["city"]}}}
</tools>

For each tool call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.<|end_of_text|>
<|start_of_role|>user<|end_of_role|>What's the weather like in Boston right now?<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|><tool_call>
{"name": "get_current_weather", "arguments": {"city": "Boston"}}
</tool_call><|end_of_text|>