AI models have come a long way. What began as rule-based, programmed engines or expert systems have evolved into trained models capable of autonomous decision-making or predictions. AI applications, in turn, have moved beyond game-playing programs to generative AI (genAI) tools capable of automating complex workflows.
Let’s look at how AI models have changed and how they’re shaping up for the future.
Industry newsletter
Get curated insights on the most important—and intriguing—AI news. Subscribe to our weekly Think newsletter. See the IBM Privacy Statement.
Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.
Traditional AI models have been around for decades. Two of these fundamental paradigms include classification models and regression models. While classification models predict discrete categories, regression models predict continuous values. Both fall under supervised learning, a machine learning (ML) technique that relies on labeled data for model training.
Conventional ML model types owe their staying power to years of research. They have been honed over time, and their maturity and proven approaches align with sectors like finance and healthcare, where deterministic results are often favored over the probabilistic outputs of more complex models. For example, random forest classifiers help with fraud detection, distinguishing fraudulent from legitimate transactions. Meanwhile, convolutional neural networks aid in computer vision tasks like image classification, object detection and image segmentation, identifying the exact location and boundaries of tumors in medical images and specifying whether they’re benign or malignant.
Other industries also continue to adopt classical machine learning models. Marketing teams can apply decision tree classifiers for sentiment analysis on social media, manufacturing companies can use linear regression models for supply chain forecasting and scientists can implement support vector regression algorithms (an extension of support vector machines) for climate modeling.
Within the last decade or so, AI advancements have mostly focused on deep learning, a subset of ML that simulates the human brain’s intricate decision-making through deep, multilayered neural networks. This has ushered in a genAI era, with generative models trained using unsupervised learning strategies to make sense of unlabeled training data without the need for human intervention.
Generative model types started with variational autoencoders, which propelled further breakthroughs in anomaly detection and image recognition. Then came generative adversarial networks and diffusion models for image and video generation.
But the true game-changer was the transformer model. This neural network architecture excels at handling sequential datasets, making it an ideal fit for natural language processing (NLP). Transformers evolved from recurrent neural networks (RNNs) and now power the large language models (LLMs) behind popular virtual assistants and chatbots such as OpenAI’s ChatGPT (backed by generative pretrained transformers), AI tools like Midjourney for image generation, AI-powered coding assistants like GitHub Copilot and even the recent surge of AI agents.
Innovation has been swift, with transformers shifting from supporting mainly NLP tasks to multiple modalities, such as vision transformers for vision language models (VLMs). Abraham Daniels, a Senior Technical Product Manager for IBM’s Granite suite of foundation models, refers to these efforts as a “consolidation of model capabilities” that can improve the user experience. “From a user standpoint, it’s about getting the right model for the job.”
Models continue to evolve. Reasoning models, for example, are fine-tuned to break down complex problems, applying reinforcement learning techniques that incentivize these LLMs to generate smaller, intermediate “reasoning steps” before arriving at a conclusion. Meanwhile, world models learn computational representations of the real world, including causal relationships, physical dynamics and spatial characteristics. These learning algorithms can help physical AI systems like robots and self-driving cars better perceive and navigate their environments in real time.
For Daniels, the Mixture of Experts (MoE) approach is the current mainstay. MoE divides a deep learning model into “experts”—subnetworks that specialize in processing particular input data—then selectively activates certain experts for a specific task. “MoE models are still the de facto model types that demonstrate the highest performance while still maintaining a level of efficiency on training and inferencing,” he says.
However, state space models, specifically Mamba models that can effectively tackle long sequences, are gaining ground and might soon dethrone MoE. At IBM Research, for instance, Daniels and his team are working on a hybrid model combining the best of Mamba and MoE. “We take a lot of the efficiencies from a Mixture of Experts-style model and the context length capabilities of a state space model.”
The rapid pace of development in artificial intelligence means that different types of AI models keep cropping up, making it difficult to predict what to expect in the coming years. But David Cox, VP for AI models at IBM Research, has noticed an encouraging trend of small language models (SLMs) outshining their larger counterparts, with compute- and energy-intensive models compressed “by a factor of almost 10 every six to nine months,” he observes.
Such shrinkage makes SLMs faster and more efficient to run on compact hardware. “It’s going to be much more widespread because we can pack more into smaller packages,” Cox adds.
Daniels echoes the sentiment, noting that “from an enterprise standpoint, you want to be able to utilize these models for your own domain or data. And tuning these models is a lot simpler and more cost-effective. You can experiment a lot more quickly with these small models.”
Another direction Daniels sees AI models taking involves modularity, with companies “able to use a model but call on a specific feature or capability as part of their use case without having to load a new model.” This dynamic switching skill is made possible by an AI technology known as an activated low-rank adapter (LoRA).
According to Cox, an activated LoRA allows a model to change its weights during inferencing. “We can lean its weights toward different tasks at runtime. It can become the best RAG system when it needs to be, or it can become the best function-calling agent when it needs to be. And it doesn’t matter what its other skills are because it can lean that way,” he says. “There’s going to be a lot of flexibility. The model will orchestrate its own inference, and that’s going to be really exciting.”
Smaller, more dynamic model types could signify the AI industry hitting what Daniels calls a “scaling wall” in terms of model computing power. “We’ve got to change the thinking about what comes next,” he says.
This next step might just be an approach known as generative computing. The idea is to treat models as computing functions, much like the programs that make up any other software. Instead of prompt engineering and application programming interface (API) calls, generative computing employs a runtime environment equipped with programming abstractions, such as structured requirements, safety guardrails and validation checks.
“We’re focusing on how to have a more programmatic or defined experience with the model so we can get more deterministic about what comes in and what comes out,” says Daniels. “And in terms of what comes out of the model, we can organize it so we have reproducible and accurate results that align with the actual task at hand.”
One way to think about it, Cox notes, is to treat a model’s memory as a new data type and the model itself as a sort of processor for that data. “It’s like a new representation, a new format, and then it just becomes natural to do things like loading programs to change the behavior of the system.”
Cox adds that generative computing has “a ton of potential for making these models more predictable and more trustworthy—all by using them in a different way and integrating them as software in a different way. We’re hoping that’s going to be a trend that picks up. We’re seeing pieces of it across the ecosystem, but bringing it all together focuses the effort in a different way.”
No matter what’s in store for the future of AI models, Daniels envisions that the success of any existing or emerging model hinges on the answers to these three questions: Is model performance strong? Is it cost-effective to use? Does the model fit the business case?
“If you could answer those three questions and you’ve got a good model, then you’ve got a case to move forward,” says Daniels.
Achieve over 90% cost savings with Granite's smaller and open models, designed for developer efficiency. These enterprise-ready models deliver exceptional performance against safety benchmarks and across a wide range of enterprise tasks from cybersecurity to RAG.
Put AI to work in your business with IBM's industry-leading AI expertise and portfolio of solutions at your side.
Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.