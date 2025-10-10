Others in the field are exploring the same path. NVIDIA’s Nemotron-H series combines Mamba layers with transformers and reports notable speedups. Research papers have found that hybrids and SSMs outperform transformers at very long contexts in both speed and efficiency—and that for vision and multimodal tasks, hybrids such as MambaVision and MaTVLM combine SSM and transformer components to extend the efficiency gains beyond text.

What sets Granite 4.0 apart is not just the hybrid itself but its packaging for enterprise. The models in the family are open-weight, meaning their parameters are available for anyone to inspect. Granite 4.0 is certified under the new ISO/IEC 42001:2023 standard for responsible AI management systems, cryptographically signed to guarantee provenance and covered by IBM’s indemnity when used on watsonx.ai. IBM has also launched a bug bounty program with HackerOne, offering rewards for vulnerabilities or alignment issues.

The architecture also changes the limits of context length. Granite 4.0 models were trained on sequences of over half a million tokens and validated at up to 128,000. Unlike transformers, which rely on positional encoding to keep track of order and can falter at unseen lengths, SSMs inherently preserve sequence order because they read sequentially. With Granite 4.0, IBM removed positional encoding entirely, eliminating one of the structural limits that has long capped transformer models.

“The architecture is not putting a hard constraint on how many tokens the model can consume at one time,” Soule said. “You could keep pushing that boundary further and further.”

For industries with sprawling data, that freedom matters. Financial firms analyzing years of trades, healthcare providers integrating patient records or law firms handling massive case files all face inputs that can outgrow transformer models. The hybrid approach creates architectural headroom for those tasks.

Performance numbers show Granite hybrids competing at the top of their weight class, with the three-billion-parameter hybrid outpacing IBM’s earlier eight-billion-parameter transformer. Granite-4.0-H Small, with 32 billion parameters and nine billion active, ranks near the top of instruction-following benchmarks, outperforming nearly all open models that are many times its size. It also performs strongly on function calling, where AI systems trigger external tools, and on retrieval-augmented generation, a method of pulling in outside data to improve answers.

Soule said the aim is not to dominate every chart, but to deliver where it matters. “Where we’re focusing with Granite is on the core tasks and strengths we want the model to excel at: retrieval augmented generation, tool calling, instruction following.”