Tiny models, significant shift: Why Granite 4.0 Nano could change how we use AI

gray space with top of green cube in right corner & a transparent blue cube on top with a keyhole in the front

Author

Sascha Brodsky

Staff Writer

IBM

This article was featured in the Think newsletter. Get it in your inbox.

You’re on a plane with no wifi, typing an email on your phone. As you write, a compact AI model on your device tidies your prose, suggests a summary and drafts a reply without ever connecting to the cloud.

IBM’s newly released Granite 4.0 Nano is built for that kind of scenario. The models are small enough to run on everyday hardware, yet powerful enough to handle tasks like automating workflows, summarizing reports and analyzing data. They reflect a broader industry shift toward smaller and more efficient systems that move artificial intelligence out of remote data centers and onto the devices people use every day.

“Some people want one model that can do everything,” said Gabe Goodhart, Chief Architect of AI Open Innovation at IBM, in an interview with IBM Think. “But for many use cases, you need something slimmer, faster and much cheaper to run.”

Granite Nano enters an increasingly crowded field of small models. Meta’s Llama 3.1, Mistral’s Mixtral, Alibaba’s Qwen and Google’s Gemma are also developing compact models designed to operate on smaller hardware or private servers. IBM’s version focuses on safety and trust, with features that make it suitable for regulated industries, such as open-weight licensing, cryptographic verification and ISO 42001 certification.

Small models, big advantages

For years, the dominant narrative in AI has been one of scale. Each new generation of models promised more parameters, more data and better results. Yet as these systems grew to trillions of parameters, their costs, power demands and environmental impact became impossible to ignore. The current shift toward smaller, more focused systems represents a practical rebalancing, according to Eren Celebi, Principal Engineer and AI Innovation Leader at Ogilvy: intelligence does not depend solely on size.

“The human brain has about 89 billion neurons, but only a fraction are used for each task,” he said in an interview with IBM Think. “Today’s large language models are like monster trucks: impressive and powerful, but not always necessary to get from point A to B.”

The compact models trend, he explained, reflects a desire for control and autonomy. “There’s a movement toward on-device models that are safe, private and tuned for specific domains,” he said. “When fine-tuned for a particular task, they can outperform much larger systems.”

Granite 4.0 Nano follows that philosophy. Using IBM’s hybrid Mamba/transformer architecture, the models reduce memory use while retaining long-context reasoning. They can operate directly on laptops or industrial systems without sending data to external servers, enabling real-time insights with data privacy.

Celebi noted that many users already benefit from small models without realizing it, since even the largest systems rely on collections of specialized sub-models that activate only when needed. “When you use a service like ChatGPT, it doesn’t engage a single massive model,” he said. “It activates smaller expert pathways for different kinds of tasks.”

At Ogilvy, task-specific models have become central to brand safety. “For major clients, a brand’s visual identity is priceless,” Celebi said. “We fine-tune small image models using their assets so they can generate content that remains perfectly on-brand and secure. Because the models are small and open-weight, the client can actually own them.”

Celebi stressed that companies are increasingly seeking to build or train models that belong entirely to them, rather than renting intelligence through cloud APIs. “Every narrow task can benefit from a smaller model tuned precisely for it,” he said. “From a robot cleaning dishes to a digital assistant representing a person, the more focused the model, the better it performs.”

Goodhart said he has seen the same pattern inside enterprise AI. “The conversation has shifted from power to purpose,” he said. “Companies don’t need a model that knows everything. They need one that does something well, safely and efficiently.”

The latest AI trends, brought to you by experts

Get curated insights on the most important—and intriguing—AI news. Subscribe to our weekly Think newsletter. See the IBM Privacy Statement.

Thank you! You are subscribed.

Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.

From data centers to devices

Running AI locally changes both how it works and how it feels to use, Celebi said. “Running a model on-device means that the model is yours,” he added. “It creates a more honest interaction. No intermediary is watching or capable of pulling the plug.”

He emphasized that for users who value both privacy and reliability, autonomy matters. “People are uneasy about cloud AI for reasons that go beyond security,” he said. “It’s about dependence. Working with an agent hosted on a corporate server can make you feel like you’re being monitored. On-device AI feels personal. It belongs to you.”

Granite Nano was built to make that independence practical, Goodhart said. The models can run on modest consumer hardware or industrial processors, processing data where it is created instead of sending it elsewhere. A logistics platform might utilize Granite Nano to schedule deliveries in real-time. A hospital could use it to analyze patient records securely, behind its firewall. A manufacturer employs embedded Granite Nano models to monitor machinery and detect early signs of failure.

These applications are made possible by the model’s technical design. Granite’s hybrid Mamba/transformer structure combines conventional transformer layers with Mamba-2 linear attention, allowing for long-context reasoning with minimal memory use. “Even though a model might have 40 layers, only four or five use full attention,” Goodhart said. “It behaves like a much smaller model from a scaling perspective.”

That efficiency, Goodhart said, extends beyond the enterprise. Granite Nano could handle predictive maintenance in cars, interpret voice commands in field equipment or power augmented reality glasses for technicians. Each example brings intelligence closer to the edge, where people and machines actually interact.

There’s another reason why companies like small models, Goodhart said: they also carry environmental advantages. Running inference, the process of generating responses, consumes far more energy than training—and local models can reduce that footprint dramatically. “Every device we carry has computing power sitting idle,” Celebi said. “Why not put it to use running models that serve you directly?”

Artificial intelligence is gradually becoming more distributed. Instead of running only in large data centers, models like Granite Nano are starting to operate on local hardware across businesses and devices, Goodhart said. The shift reflects a broader push toward efficiency and accessibility, rather than size alone.

“It’s not that small models can solve every problem,” Goodhart said. “But they can handle a lot of what we do every day, quietly and efficiently. They make AI something you can hold in your hand or build directly into the world around you.”

Related solutions
IBM Granite

Achieve over 90% cost savings with Granite's smaller and open models, designed for developer efficiency. These enterprise-ready models deliver exceptional performance against safety benchmarks and across a wide range of enterprise tasks from cybersecurity to RAG.

Explore Granite
Artificial intelligence solutions

Put AI to work in your business with IBM's industry-leading AI expertise and portfolio of solutions at your side.

Explore AI solutions
AI consulting and services

Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.

Explore AI services
Take the next step

Explore the IBM library of foundation models in the IBM watsonx portfolio to scale generative AI for your business with confidence.

Discover watsonx.ai Explore IBM Granite AI models