This article was featured in the Think newsletter. Get it in your inbox.
IBM has unveiled Granite 4.0, a new generation of open-source language models that it claims can run faster, more cost-effectively and with greater security than conventional AI systems.
The release marks the company’s sharpest bet yet on efficiency over scale. Rather than chasing the trillion-parameter frontier models of rivals, IBM is presenting a hybrid architecture designed to reduce memory demands by over 70% while maintaining performance. It is also the first open-weight model family to earn ISO 42001 certification for responsible AI management —a move aimed at convincing enterprises that they can deploy open systems without sacrificing trust.
“Granite 4.0 represents IBM’s emphasis on efficient AI for enterprise,” said Kate Soule, who leads Technical Product Management for IBM’s AI models portfolio, in an interview with IBM Think. “We want to provide options where customers have a reliable building block that can scale to production workloads while helping defray the costs of running agents and broader deployments.”
Granite 4.0 arrives as the latest entrant in a crowded field. OpenAI, Meta and Mistral have each unveiled increasingly larger models measured in the hundreds of billions of parameters. IBM is taking a different route. By emphasizing efficiency, the company is betting that enterprises will prioritize cost, governance and reliability over raw scale.
At the center of the release is a hybrid design that blends conventional transformer layers with Mamba-2, an emerging sequence model. Transformers have long powered modern AI, but their compute demands rise quadratically with the length of the input. That design makes them expensive and slow for long documents or multiple concurrent sessions. By contrast, Mamba scales linearly, reducing the memory burden. IBM’s approach combines the two.
The result is a family of models that can reduce memory use by as much as 70 to 80% compared with transformer-only systems. Soule said the shift is aimed directly at the pain points enterprises face when moving beyond pilots. Many proofs of concept are built on massive models, but those systems often prove too costly to run at scale.
“You do the math and see both the cost of how many GPUs you would need to host and maintain, potentially on-prem, and what real-world users will tolerate from latency perspectives,” she said. “Customers quickly realize they need smaller options to deploy at scale.”
The Granite 4.0 portfolio is designed to cover a range of needs. At the top is Granite-4.0-H-Small, a hybrid mixture of experts model with 32 billion parameters, nine billion of them active at once. IBM positions it as a workhorse for enterprise workflows such as customer support automation and multi-tool agents.
Granite-4.0-H-Tiny, with a total of seven billion parameters and one billion active, and Granite-4.0-H-Micro, with three billion parameters, are designed for edge applications and the fast execution of specific tasks. IBM also released a dense transformer-only model of three billion parameters for developers who prefer more traditional architectures.
Soule pointed out that the smallest of these can already run on ordinary laptops. A three-billion-parameter model operating with a 128,000-token context length and 8-bit precision consumes only four gigabytes of memory, which is small enough to fit on a Raspberry Pi. IBM also plans to release Nano models with as few as 300 million parameters, targeting edge deployments even more directly.
The spectrum is designed to appeal to both developers who seek lightweight, customizable models and executives who require assurances of trust and governance.
IBM is attempting to distinguish Granite not only by efficiency but also by its focus on governance. Granite 4.0 is the first open model family to secure ISO 42001 certification, an international standard for AI management systems. The accreditation followed an extensive external audit of IBM’s training, testing and data governance practices.
Enterprises have long viewed open-source software with both interest and caution, drawn to its flexibility but wary of security gaps and accountability concerns. That tension is precisely what IBM is trying to address with its certification.
“We hope that by having an ISO-certified open-source model, enterprises feel they can still rely on open source and trust that the same guarantees IBM puts into its products are also reflected in these models,” Soule said.
IBM is pairing the certification with other safety measures. The company is cryptographically signing all Granite 4.0 checkpoints on Hugging Face to verify provenance. It has also launched a bug bounty program with HackerOne to encourage outside researchers to probe the models for weaknesses. The combination is designed to provide CIOs with greater confidence in adopting open models without exposing themselves to hidden risks.
Industry newsletter
Get curated insights on the most important—and intriguing—AI news. Subscribe to our weekly Think newsletter. See the IBM Privacy Statement.
Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.
IBM is not presenting Granite as a substitute for the largest frontier systems. Instead, the company is positioning its small models as complements. The vision is that Granite can handle the many steps in agentic workflows that involve instruction following, retrieval or tool calling, while leaving complex reasoning to larger models.
Soule acknowledged the limits openly. “There’s always going to be larger, complex reasoning tasks that need much bigger models,” she said. “Our focus is on the bread-and-butter enterprise skills rather than trying to one-up everyone on a global leaderboard.”
Granite 4.0 uses a hybrid design that combines two different approaches. Most of it is built with a newer method called Mamba (Mamba-2), which reads information one step at a time and naturally keeps track of order. A smaller part utilizes the more established transformer architecture, which is particularly adept at identifying patterns from just a few examples. By combining the two, the system is designed to be both efficient and flexible, leveraging the strengths of each.
Benchmarks suggest the approach delivers. Granite-4.0-H-Small outperforms all but one open model, Meta’s far larger Llama 4 Maverick, on the IFEval instruction-following test. On the Berkeley Function Calling Leaderboard, the model competes with much larger systems while running at a fraction of the cost.
The company emphasizes that these gains translate into tangible benefits in the real world. Long inputs, such as large code bases or extensive documentation, can be processed more efficiently, and concurrent sessions, such as customer service queries, can run without exhausting hardware limits.
Before its release, IBM provided select partners, including Ernst & Young and Lockheed Martin, with early access to Granite 4.0 for testing at scale. Feedback from these deployments informed refinements and will continue to shape updates. The models are being released under an Apache 2.0 license through IBM’s watsonx.ai platform.
Soule said the broad distribution reflects IBM’s goal of embedding Granite in the open ecosystem rather than confining it to proprietary channels. “For developers, you should be able to run these models on whatever devices you have today,” she said. “For CIOs, our largest Granite 4.0 model has excellent enterprise benchmark scores, comes ISO certified and is cryptographically signed. We’re doing a lot to make sure there’s something for everyone.”
IBM plans to broaden the Granite 4.0 family before the end of the year, Soule said. Instruction-tuned models are already available, while Nano models will extend the line to the smallest edge deployments.
The company is also collaborating with Qualcomm and Nexa AI to make Granite compatible with Hexagon NPUs, enabling more efficient inference on mobile and edge hardware.
With Granite 4.0, IBM is betting that smaller, certified models will prove more useful in day-to-day enterprise work than chasing record-breaking scale.
Achieve over 90% cost savings with Granite's smaller and open models, designed for developer efficiency. These enterprise-ready models deliver exceptional performance against safety benchmarks and across a wide range of enterprise tasks from cybersecurity to RAG.
Put AI to work in your business with IBM's industry-leading AI expertise and portfolio of solutions at your side.
Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.