Home Topics AI model What is an AI model?
Explore IBM's AI model solutions Subscribe for AI updates
Illustration of man at desk working with different information sources
What is an AI model?

An AI model is a program that has been trained on a set of data to recognize certain patterns or make certain decisions without further human intervention. Artificial intelligence models apply different algorithms to relevant data inputs to achieve the tasks, or output, they’ve been programmed for.

Simply put, an AI model is defined by its ability to autonomously make decisions or predictions, rather than simulate human intelligence. Among the first successful AI models were checkers- and chess-playing programs in the early 1950s: the models enabled the programs to make moves in direct response to the human opponent, rather than follow a pre-scripted series of moves.

Different types of AI models are better suited for specific tasks, or domains, for which their particular decision-making logic is most useful or relevant. Complex systems often employ multiple models simultaneously, using ensemble learning techniques like bagging, boosting or stacking.

As AI tools grow increasingly complex and versatile, they require increasingly challenging amounts of data and computing power to train and execute. In response, systems designed to execute specific tasks in a single domain are giving way to foundation models, pre-trained on large, unlabeled datasets and capable of a wide array of applications. These versatile foundation models can then be fine-tuned for specific tasks.

Algorithms vs. models

Though the two terms are often used interchangeably in this context, they do not mean quite the same thing.

  • Algorithms are procedures, often described in mathematical language or pseudocode, to be applied to a dataset to achieve a certain function or purpose.
  • Models are the output of an algorithm that has been applied to a dataset.

In simple terms, an AI model is used to make predictions or decisions and an algorithm is the logic by which that AI model operates.

The data store for AI

Discover the power of integrating a data lakehouse strategy into your data architecture, including enhancements to scale AI and cost optimization opportunities.

Related content

Register for the ebook on Presto

AI models and machine learning

AI models can automate decision-making, but only models capable of machine learning (ML) are able to autonomously optimize their performance over time.

While all ML models are AI, not all AI involves ML. The most elementary AI models are a series of if-then-else statements, with rules programmed explicitly by a data scientist. Such models are alternatively called rules engines, expert systems, knowledge graphs or symbolic AI.

Machine learning models use statistical AI rather than symbolic AI. Whereas rule-based AI models must be explicitly programmed, ML models are “trained” by applying their mathematical frameworks to a sample dataset whose data points serve as the basis for the model’s future real-world predictions.

ML model techniques can generally be separated into three broad categories: supervised learning, unsupervised learning and reinforcement learning.

  • Supervised learning: also known as “classic” machine learning, supervised learning requires a human expert to label training data. A data scientist training an image recognition model to recognize dogs and cats must label sample images as “dog” or “cat”, as well as key features—like size, shape or fur—that inform those primary labels.  The model can then, during training, use these labels to infer the visual characteristics typical of “dog” and “cat”.
  • Unsupervised learning: Unlike supervised learning techniques, unsupervised learning does not assume the external existence of “right” or “wrong” answers, and thus does not require labeling. These algorithms detect inherent patterns in datasets to cluster data points into groups and inform predictions. For example, e-commerce businesses like Amazon use unsupervised association models to power recommendation engines.
  • Reinforcement learning: in reinforcement learning, a model learns holistically by trial and error through the systematic rewarding of correct output (or penalization of incorrect output). Reinforcement models are used to inform social media suggestions, algorithmic stock trading, and even self-driving cars.

Deep learning is a further evolved subset of unsupervised learning whose structure of neural networks attempts to mimics that of the human brain. Multiple layers of interconnected nodes progressively ingest data, extract key features, identify relationships and refine decisions in a process called forward propagation. Another process called backpropagation applies models that calculate errors and adjust the system’s weights and biases accordingly. Most advanced AI applications, like the large language models (LLMs) powering modern chatbots, utilize deep learning. It requires tremendous computational resources.

Read the blog: "AI vs. Machine Learning vs. Deep Learning vs. Neural Networks: What's the difference?"
Read the blog: "Supervised vs. Unsupervised Learning: What's the difference?"
Generative models vs. discriminative models

One way to differentiate machine learning models is by their fundamental methodology: most can be categorized as either generative or discriminative. The distinction lies in how they model the data in a given space.

Generative models
algorithms, which usually entail unsupervised learning, model the distribution of data points, aiming to predict the joint probability P(x,y) of a given data point appearing in a particular space. A generative computer vision model might thereby identify correlations like “things that look like cars usually have four wheels” or “eyes are unlikely to appear above eyebrows.”

These predictions can inform the generation of outputs the model deems highly probable. For example, a generative model trained on text data can power spelling and autocomplete suggestions; at the most complex level, it can generate entirely new text. Essentially, when an LLM outputs text, it has computed a high probability of that sequence of words being assembled in response to the prompt it was given.

Other common use cases for generative models include image synthesis, music composition, style transfer and language translation.

Examples of generative models include:

  • Diffusion models: diffusion models gradually add Gaussian noise to training data until it’s unrecognizable, then learn a reversed “denoising” process that can synthesize output (usually images) from random seed noise.
  • Variational autoencoders (VAEs): VAEs consist of an encoder that compresses input data and a decoder that learns to reverse the process and map likely data distribution.
  • Transformer models: Transformer models use mathematical techniques called “attention” or “self-attention” to identify how different elements in a series of data influence one another. The “GPT” in OpenAI’s Chat-GPT stands for “Generative Pretrained Transformer.”

Discriminative models
Discriminative algorithms, which usually entail supervised learning, model the boundaries between classes of data (or “decision boundaries”), aiming to predict the conditional probability P(y|x) of a given data point (x) falling into a certain class (y). A discriminative computer vision model might learn the difference between “car” and “not car” by discerning a few key differences (like "if it doesn’t have wheels, it’s not a car”), allowing it to ignore many correlations that a generative model must account for. Discriminative models thus tend to require less computing power.

Discriminative models are, naturally, well suited to classification tasks like sentiment analysis—but they have many uses. For example, decision tree and random forest models break down complex decision-making processes into a series of nodes, at which each “leaf” represents a potential classification decision.

Use cases
While discriminative or generative models may generally outperform one another for certain real-world use cases, many tasks could be achieved with either type of model. For example, discriminative models have many uses in natural language processing (NLP) and often outperform generative AI for tasks like machine translation (which entails the generation of translated text).

Similarly, generative models can be used for classification using Bayes’ theorem. Rather than determining which side of a decision boundary an instance is on (like a discriminative model would), a generative model could determine the probability of each class generating the instance and pick the one with higher probability.

Many AI systems employ both in tandem. In a generative adversarial network, for example, a generative model generates sample data and a discriminative model determines whether that data is “real” or “fake.” Output from the discriminative model is used to train the generative model until the discriminator can no longer discern “fake” generated data.

Classification models vs. regression models

Another way to categorize models is by the nature of the tasks they are used for. Most classic AI model algorithms perform either classification or regression. Some are suitable for both, and most foundation models leverage both kinds of functions.

This terminology can, at times, be confusing. For example, logistic regression is a discriminative model used for classification.

Regression models
Regression models predict continuous values (like price, age, size or time). They’re primarily used to determine the relationship between one or more independent variables (x) and a dependent variable (y): given x, predict the value of y.

  • Algorithms like linear regression, and related variants like quantile regression, are useful for tasks like forecasting, analyze pricing elasticity, and assessing risk.
  • Algorithms like polynomial regression and support vector regression (SVR) model complex non-linear relationships between variables.
  • Certain generative models, like autoregression and variational autoencoders, account for not only correlative relationships between past and future values, but also causal relationships. This makes them particularly useful for forecasting weather scenarios and predicting extreme climate events.    

Classification models
Classification models predict discrete values. As such, they’re primarily used to determine an appropriate label or to categorize (i.e., classify). This can be a binary classification—like “yes or no,” “accept or reject”—or a multi-class classification (like a recommendation engine that suggests Product A, B, C or D).

Classification algorithms find a wide array of uses, from straightforward categorization to automating feature extractions in deep learning networks to healthcare advancements like diagnostic image classification in radiology.

Common examples include:

  • Naïve bayes: a generative supervised learning algorithm commonly used in spam filtering and document classification.
  • Linear discriminant analysis: used to resolve contradictory overlap between multiple features that impact classification.
  • Logistic regression: predicts continuous probabilities that are then used as proxy for classification ranges.
Training AI models

The “learning” in machine learning is achieved by training models on sample datasets. Probabilistic trends and correlations discerned in those sample datasets are then applied to performance of the system’s function.

In supervised and semi-supervised learning, this training data must be thoughtfully labeled by data scientists to optimize results. Given proper feature extraction, supervised learning requires a lower quantity of training data overall than unsupervised learning.

Ideally, ML models are trained on real-world data. This, intuitively, best ensures that the model reflects the real-world circumstances that it’s designed to analyze or replicate. But relying solely on real-world data is not always possible, practical or optimal.

Increasing model size and complexity
The more parameters a model has, the more data is needed to train it. As deep learning models grow in size, acquiring this data becomes increasingly difficult. This is particularly evident in LLMs: both Open-AI’s GPT-3 and the open source BLOOM have over 175 billion parameters.

Despite its convenience, using publicly available data can present regulatory issues, like when the data must be anonymized, as well as practical issues. For example, language models trained on social media threads may “learn” habits or inaccuracies not ideal for enterprise use.

Synthetic data offers an alternative solution: a smaller set of real data is used to generate training data that closely resembles the original and eschews privacy concerns.

Eliminating bias
ML models trained on real-world data will inevitably absorb the societal biases that will be reflected in that data. If not excised, such bias will perpetuate and exacerbate inequity in any field such models inform, like healthcare or hiring. Data science research has yielded algorithms like FairIJ and model refinement techniques like FairReprogram to address inherent inequity in data.

Overfitting and underfitting
Overfitting occurs when an ML model fits training data too closely, causing irrelevant information (or “noise”) in the sample dataset to influence the model’s performance. Underfitting is its opposite: improper or inadequate training.

Foundation models

Also called base models or pre-trained models, foundation models are deep learning models pretrained on large-scale datasets to learn general features and patterns. They serve as starting points to be fine-tuned or adapted for more specific AI applications.

Rather than building models from scratch, developers can alter neural network layers, adjust parameters or adapt architectures to suit domain-specific needs. Added to the breadth and depth of knowledge and expertise in a large and proven model, this saves significant time and resources in model training. Foundation models thus enable faster development and deployment of AI systems.

Fine-tuning pretrained models for specialized tasks has recently given way to the technique of prompt-tuning, which introduces front-end cues to the model in order to guide the model toward the desired type of decision or prediction.

According to David Cox, co-director of the MIT-IBM Watson AI Lab, redeploying a trained deep learning model (rather than training or retraining a new model) can cut computer and energy use by over 1,000 times, thereby saving significant cost1.

Explore foundation models in watsonx.ai


Explore how to choose the right foundation model

Testing AI models

Sophisticated testing is essential to optimization, as it measures whether a model is well-trained to achieve its intended task. Different models and tasks lend themselves to different metrics and methodologies.

Testing a model’s performance requires a control group to judge it against, as testing a model against the very data it was trained on can lead to overfitting. In cross-validation, portions of the training data are held aside or resampled to create that control group. Variants include non-exhaustive methods like k-fold, holdout and monte carlo cross-validation or exhaustive methods like leave-p-out cross-validation.

Classification model metrics
These common metrics incorporate discrete outcome values like true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN).

  • Accuracy is the ratio of correct predictions to total predictions: (TP+TN) / (TP+TN+FP+FN). It does not work well for imbalanced datasets.
  • Precision measures how often Positive predictions are accurate: TP/(TP+FP).
  • Recall measures how often positives are successfully captured: TP/(TP+FN).
  • F1 score is the harmonic mean of precision and recall: (2×Precision×Recall)/(Precision+Recall). It balances tradeoffs between precision (which encourages false negatives) and recall (which encourages false positives).
  • A confusion matrix visually represents your algorithm’s confidence (or confusion) for each potential classification.

Regression model metrics2
As regression algorithms predict continuous values rather than discrete values, they are measured by different metrics in which “N” represent the number of observations. The following are common metrics used to evaluate regression models.

  • Mean absolute error (MAE) measures the average difference between predicted values (ypred) and actual values (yactual) in absolute terms: ∑(ypred – yactual) / N.
  • Mean squared error (MSE) squares the average error to aggressively penalize outliers:  ∑(ypred – yactual)2 / N.
  • Root mean square error (RSME) measures standard deviations in the same unit as outcomes: √ (∑(ypred – yactual)2 / N).
  • Mean absolute percentage error (MAPE) expresses average error as a percentage.
Deploying AI models

To deploy and run an AI model requires a computing device or server with sufficient processing power and storage capacity. Failure to adequately plan AI pipelines and computing resources can result in otherwise successful prototypes failing to move beyond the proof-of-concept phase.

  • Open source machine learning frameworks like PyTorch, Tensorflow and Caffe2 can run ML models with a few lines of code.
  • Central processing units (CPUs) are an efficient source of computing power for learning algorithms that don’t require extensive parallel computing.
  • Graphic processing units (GPUs) have a greater capacity for parallel processing, making them better suited to the enormous data sets and the mathematically complexity of deep learning neural networks.
Related solutions
IBM watsonx

Multiply the power of AI with our next-generation AI and data platform. IBM watsonx is a portfolio of business-ready tools, applications and solutions, designed to reduce the costs and hurdles of AI adoption while optimizing outcomes and responsible use of AI.

Explore watsonx

AI solutions

Operationalize AI across your business to deliver benefits quickly and ethically.  Our rich portfolio of business-grade AI products and analytics solutions are designed to reduce the hurdles of AI adoption and establish the right data foundation while optimizing for outcomes and responsible use.

Explore IBM AI solutions

AI consulting services

Reimagine how you work with AI: our diverse, global team of more than 20,000 AI experts can help you quickly and confidently design and scale AI and automation across your business, working across our own IBM watsonx technology and an open ecosystem of partners to deliver any AI model, on any cloud, guided by ethics and trust.

Explore IBM AI consulting services
AI model resources Discover IBM's Granite LLM

Granite is IBM's flagship series of LLM foundation models based on decoder-only transformer architecture. Granite language models are trained on trusted enterprise data spanning internet, academic, code, legal and finance.

IBM Research: Artificial intelligence

Explore our centralized hub for AI research, from basic principles to emerging research to salient issues and advancements.

How IBM is tailoring generative AI for enterprises

Learn how IBM developing generative foundation models that are trustworthy, energy efficient, and portable.

Get started with artificial intelligence

A beginner course: in two hours, learn the basics of AI and build and test your first machine learning model using Python and scikit-learn.

AI model training with PyTorch

Unlock the power of generative AI with watsonx.ai and PyTorch. Manage your ML model lifecycle in a secure studio environment.

Take the next step

Train, validate, tune and deploy generative AI, foundation models and machine learning capabilities with IBM watsonx.ai, a next-generation enterprise studio for AI builders. Build AI applications in a fraction of the time with a fraction of the data.

Explore watsonx.ai Book a live demo

1 "What is prompt tuning?", IBM Research, 15 February 2023.

2 "Machine learning model evaluation" (link resides outside ibm.com), Geeksforgeeks.org, 2022.