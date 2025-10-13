The rapid pace of development in artificial intelligence means that different types of AI models keep cropping up, making it difficult to predict what to expect in the coming years. But David Cox, VP for AI models at IBM Research, has noticed an encouraging trend of small language models (SLMs) outshining their larger counterparts, with compute- and energy-intensive models compressed “by a factor of almost 10 every six to nine months,” he observes.

Such shrinkage makes SLMs faster and more efficient to run on compact hardware. “It’s going to be much more widespread because we can pack more into smaller packages,” Cox adds.

Daniels echoes the sentiment, noting that “from an enterprise standpoint, you want to be able to utilize these models for your own domain or data. And tuning these models is a lot simpler and more cost-effective. You can experiment a lot more quickly with these small models.”

Another direction Daniels sees AI models taking involves modularity, with companies “able to use a model but call on a specific feature or capability as part of their use case without having to load a new model.” This dynamic switching skill is made possible by an AI technology known as an activated low-rank adapter (LoRA).

According to Cox, an activated LoRA allows a model to change its weights during inferencing. “We can lean its weights toward different tasks at runtime. It can become the best RAG system when it needs to be, or it can become the best function-calling agent when it needs to be. And it doesn’t matter what its other skills are because it can lean that way,” he says. “There’s going to be a lot of flexibility. The model will orchestrate its own inference, and that’s going to be really exciting.”

Smaller, more dynamic model types could signify the AI industry hitting what Daniels calls a “scaling wall” in terms of model computing power. “We’ve got to change the thinking about what comes next,” he says.

This next step might just be an approach known as generative computing. The idea is to treat models as computing functions, much like the programs that make up any other software. Instead of prompt engineering and application programming interface (API) calls, generative computing employs a runtime environment equipped with programming abstractions, such as structured requirements, safety guardrails and validation checks.

“We’re focusing on how to have a more programmatic or defined experience with the model so we can get more deterministic about what comes in and what comes out,” says Daniels. “And in terms of what comes out of the model, we can organize it so we have reproducible and accurate results that align with the actual task at hand.”

One way to think about it, Cox notes, is to treat a model’s memory as a new data type and the model itself as a sort of processor for that data. “It’s like a new representation, a new format, and then it just becomes natural to do things like loading programs to change the behavior of the system.”

Cox adds that generative computing has “a ton of potential for making these models more predictable and more trustworthy—all by using them in a different way and integrating them as software in a different way. We’re hoping that’s going to be a trend that picks up. We’re seeing pieces of it across the ecosystem, but bringing it all together focuses the effort in a different way.”

No matter what’s in store for the future of AI models, Daniels envisions that the success of any existing or emerging model hinges on the answers to these three questions: Is model performance strong? Is it cost-effective to use? Does the model fit the business case?

“If you could answer those three questions and you’ve got a good model, then you’ve got a case to move forward,” says Daniels.