Granite 4.1 is a family of dense language models available in three sizes: 3B, 8B, and 30B parameters. Each size is available in both base and instruction-tuned variants, with optional FP8 quantization for efficient deployment. Built with a dense architecture, Granite 4.1 demonstrates significant improvements over Granite 4.0 in tool calling, instruction following, coding capabilities, and mathematical reasoning.
Tool Calling: Granite 4.1 demonstrates strong ability to understand and execute tool-based instructions, enabling seamless integration with various software tools and APIs. This capability allows enterprises to create powerful AI-driven workflows and automate complex tasks.Instruction Following: Granite 4.1 exhibits improved comprehension and adherence to user instructions, ensuring reliable and accurate task completion for enterprise automation.Code Generation & Explanation: Granite 4.1 generates code snippets and explains complex codebases across multiple programming languages with higher accuracy, accelerating software development workflows.Mathematical Reasoning: Granite 4.1 tackles complex mathematical problems from basic arithmetic to advanced calculus and linear algebra, enabling automated calculation and decision-making.
This is a simple example of how to use Granite-4.1-30B model:
import torchfrom transformers import AutoModelForCausalLM, AutoTokenizerdevice = "cuda"model_path = "ibm-granite/granite-4.1-30b"tokenizer = AutoTokenizer.from_pretrained(model_path)# drop device_map if running on CPUmodel = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)model.eval()# change input text as desiredchat = [ { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },]chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)# tokenize the textinput_tokens = tokenizer(chat, return_tensors="pt").to(device)# generate output tokensoutput = model.generate(**input_tokens, max_new_tokens=100)# decode output tokens into textoutput = tokenizer.batch_decode(output)# print outputprint(output[0])
Expected output:
<|start_of_role|>user<|end_of_role|>Please list one IBM Research laboratory located in the United States. You should only output its name and location.<|end_of_text|><|start_of_role|>assistant<|end_of_role|>IBM Research - Almaden, San Jose, California<|end_of_text|>
Granite-4.1-30B comes with enhanced tool calling capabilities, enabling seamless integration with external functions and APIs. Define a list of tools using OpenAI’s function definition schema:
import torchfrom transformers import AutoModelForCausalLM, AutoTokenizerdevice = "cuda"model_path = "ibm-granite/granite-4.1-30b"tokenizer = AutoTokenizer.from_pretrained(model_path)# drop device_map if running on CPUmodel = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)model.eval()tools = [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather for a specified city.", "parameters": { "type": "object", "properties": { "city": { "type": "string", "description": "Name of the city" } }, "required": ["city"] } } }]# change input text as desiredchat = [ { "role": "user", "content": "What's the weather like in Boston right now?" },]chat = tokenizer.apply_chat_template(chat, \ tokenize=False, \ tools=tools, \ add_generation_prompt=True)# tokenize the textinput_tokens = tokenizer(chat, return_tensors="pt").to(device)# generate output tokensoutput = model.generate(**input_tokens, max_new_tokens=100)# decode output tokens into textoutput = tokenizer.batch_decode(output)# print outputprint(output[0])
Expected output:
<|start_of_role|>system<|end_of_role|>You are a helpful assistant with access to the following tools. You may call one or more tools to assist with the user query.You are provided with function signatures within <tools></tools> XML tags:<tools>{"type": "function", "function": {"name": "get_current_weather", "description": "Get the current weather for a specified city.", "parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "Name of the city"}}, "required": ["city"]}}}</tools>For each tool call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:<tool_call>{"name": <function-name>, "arguments": <args-json-object>}</tool_call>. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.<|end_of_text|><|start_of_role|>user<|end_of_role|>What's the weather like in Boston right now?<|end_of_text|><|start_of_role|>assistant<|end_of_role|><tool_call>{"name": "get_current_weather", "arguments": {"city": "Boston"}}</tool_call><|end_of_text|>