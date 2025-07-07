Mamba models are perhaps the first deep learning architecture to rival the efficacy of transformer models on the task for which transformers originally won their fame: language modeling. Most notably, the Mamba architecture has demonstrated the capacity to match equivalently sized transformers on prominent LLM benchmark evaluations while often being significantly more efficient in terms of latency and memory requirements.

The Mamba architecture was first introduced by Tri Dao and Albert Gu in the 2023 paper, “Mamba: Linear-Time Sequence Modeling with Selective State Spaces.” A year later, they followed up the original Mamba paper with another paper that both explored the connections between SSMs and transformers and presented a refined, significantly faster version of the Mamba architecture, which they dubbed Mamba-2.

Though transformers have remained the dominant mode of LLM in the 2 years following the release of the original Mamba paper, the architecture has been incorporated into a growing number of open source models. Some, such as Mistral AI’s Codestral Mamba, are pure Mamba models. Many more, including AI2I’s Jamba series and IBM Granite 4.0, are hybrid models incorporating both attention (transformer) layers and SSM (Mamba) layers. In addition to their performance-based benefits, the proliferation of Mamba-based models promises to democratize AI access by virtue of running smoothly on comparatively inexpensive hardware.