What Is a Neural Network?

By Fangfang Lee

What is a neural network?

A neural network is a machine learning model that stacks simple "neurons" in layers and learns pattern-recognizing weights and biases from data to map inputs to outputs.

Neural networks are among the most influential algorithms in modern machine learning and artificial intelligence (AI). They underpin breakthroughs in computer vision, natural language processing (NLP), speech recognition and countless real-world applications ranging from forecasting to facial recognition. While today’s deep neural networks (DNNs) power systems as complex as transformers and convolutional neural networks (CNNs), the origins of neural networks trace back to simple models such as linear regression and how the human brain digests, processes and decides on the information presented to it.

How do neural networks work?

On a high level, the inspiration for neural networks comes from the biological neurons in the human brain, which communicate through electrical signals. In 1943, Warren McCulloch and Walter Pitts proposed the first mathematical model of a neuron, showing that simple units could perform computation of a function. Later, in 1958, Frank Rosenblatt introduced the perceptron, an algorithm designed to perform pattern recognition. The perceptron is the historical ancestor of today’s networks: essentially a linear model with a constrained output. In the following section, we will dive into how neural networks borrow inspiration from the human brains to make decisions and recognize patterns.

A neural network can be understood through a simple example: spam detection. An email is fed into the network, and features such as words or phrases like “prize,” “money,” “dear” or “win” are used as inputs. The early neurons in the network process the importance of each signal, while later layers combine this information into higher-level cues that capture context and tone. The final layer then computes a probability of whether the email is spam, and if that probability is high enough, the email is flagged. In essence, the network learns how to transform raw features into meaningful patterns and use them to make predictions.

This process is powered by two fundamental concepts: weights and biases. Weights act like dials that control how strongly each input feature influences the decision—a word like “prize” may be given more weight than a common word like “hello.” Biases are built-in values that shift the decision threshold, allowing a neuron to activate even if the inputs themselves are weak. Together, these model parameters determine how each neuron contributes to the overall computation. By adjusting these values during training, the network gradually learns to make accurate predictions—in this case, whether an email is spam or not.

Mathematically, a neural network learns a function $f (X)$ by mapping an input vector $X = (x 1, x 2, x 3. . .)$ to a predict a response $Y .$ What distinguishes neural networks from other traditional machine learning algorithms is their layered structure and their ability to perform nonlinear transformation.

Join over 100,000 subscribers who read the latest news in tech

Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think Newsletter, delivered twice weekly. See the IBM Privacy Statement.

A neural network is comprised of:

Input layer: holds the raw features $(X 1, X 2, X 3, . .)$ .
Hidden layers: consist of artificial neurons (or nodes) that transform inputs into new representations. Mathematically, hidden layers are expressed as the input features, multiplied by their associated weights and added bias to pass from one layer to the next layer, eventually arriving at the final output layer. This is where the linear transformation between input and output happens.
Output layer: After performing the linear transformation in the hidden layer, a nonlinear activation function (tanh, sigmoid, ReLU ) is added to produce the final prediction (such as a number for regression, or a probability distribution for classification).

Diagram of a neural network with three hidden layers: input layer, multiple hidden layers, output layer

A standard feedforward neural network with 3 hidden layers.

Neural network training

Just like other machine learning algorithms, a neural net requires rigorous training to perform well on testing. To train a network, a single neuron computes:

$z = \sum_{i = 1}^{n} w_{i} x_{i} + b$

$a = σ (z)$

Where:

$x_{i}$ = input feature,
$w_{i}$ = weight,
$b$ = bias,
$z$ = weighted sum (linear transformation),
$σ$ = activation function (nonlinear transformation),
$a$ = output,

$σ$ represents an activation function at the output layer that transforms the linear combination to fit the decision of the function. Using this architecture, the input features X are transformed into an output Y, serving as a predictive machine learning model.

The power of a neural network comes from its ability to learn the right weights and biases from data. This is done by comparing the network’s prediction $\hat{Y}$ to the true label $Y$ and measuring the error using a loss function. For example, in classification tasks, the loss might measure how far the predicted probability is from the correct answer.

To minimize this loss, the network uses an algorithm called backpropagation. The neural net trains in four steps:

Forward pass: Inputs flow through the network, computing linear combinations, passing through the nonlinear activation function and producing an output prediction.
Error calculation: The loss function measures the difference between prediction and truth.
Backward pass (backpropagation): The error is propagated backward through the network. At each neuron, the algorithm calculates how much each weight and bias contributed to the error using the chain rule of calculus.
Weight update: The weights and biases are adjusted slightly in the direction that reduces the error, using an optimization method like gradient descent.

Gradient descent diagram, "value of weight" in x-axis and "loss" in y-axis, and a "starting point" in the upper left side of the diagram, there is the text in the lowest part "point of convergence i.e. where the cost function is at its minimum"

This process is repeated many times over the training dataset. Each pass helps the network “tune” its internal parameters so that its predictions get incrementally closer to the correct answers. Over time, the network converges to a set of weights and biases that minimize error and generalize well to unseen data. Backpropagation, coupled with gradient descent, is the engine that makes neural networks work. It enables networks with millions (or even billions) of parameters to learn meaningful patterns from massive datasets.

However, despite practitioners’ effort to train high performing models, neural networks still face challenges similar to other machine learning models—most significantly, overfitting. When a neural network becomes overly complex with too many parameters, the model will overfit to the training data and predict poorly. Overfitting is a common problem in all kinds of neural networks, and paying close attention to bias-variance tradeoff is paramount to creating high-performing neural network models.

Modern neural network architectures—such as transformers and encoder-decoder models—follow the same core principles (learned weights and biases, stacked layers, nonlinear activations, end-to-end training by backpropagation). They differ mainly in how inputs are mixed across layers. Instead of fully connected mixing alone, transformers use attention to form data-dependent weighted combinations of representations, alongside residual connections, normalization and positional encodings to enrich wiring built on the same fundamentals.

Types of neural networks

While multilayer perceptrons are the foundation, neural networks have evolved into specialized architectures suited for different domains:

Convolutional neural networks (CNNs or convnets): Designed for grid-like data such as images. CNNs excel at image recognition, computer vision and facial recognition thanks to convolutional filters that detect spatial hierarchies of features.
Recurrent neural networks (RNNs): Incorporate feedback loops that allow information to persist across time steps. RNNs are well-suited for speech recognition, time series forecasting and sequential data.
Transformers: A modern architecture that replaced RNNs for many sequence tasks. Transformers leverage attention mechanisms to capture dependencies in natural language processing (NLP) and power state-of-the-art models like GPT.
These variations highlight the versatility of neural networks. Regardless of architecture, all rely on the same principles: artificial neurons, nonlinear activations and optimization algorithms.

Mixture of Experts | 17 July, episode 116

Your weekly news podcast for AI enthusiasts

Hear from industry experts on the latest in AI news, listen to Mixture of Experts podcast. New episodes on Fridays at 6am EST.

Go to episodes

Neural network applications

Neural networks underpin many of today’s AI systems. Some prominent applications of neural networks include:

Computer vision: CNNs for image recognition, medical imaging and autonomous vehicles.
Natural language processing: Transformers for machine translation, chatbots and summarization.
Speech recognition: RNNs and deep nets for transcription and voice assistants.
Forecasting and time series: Demand prediction, financial modeling and weather forecasting.
Reinforcement learning: Neural nets as function approximators in game-playing agents (for example, Deepmind’s go-playing AlphaGo).
Pattern recognition: Identifying fraud, detecting anomalies, or classifying documents.

These applications drive real-world innovations in healthcare, finance, robotics, entertainment and beyond.

Why neural networks matter

Neural networks learn useful internal representations directly from data, capturing nonlinear structure that classical models miss. With sufficient capacity, sound objectives and regularization against overfitting, they scale from small benchmarks to production systems in computer vision, natural language processing, speech recognition, forecasting and more—delivering measurable gains in accuracy and robustness.

Modern deep learning extends these foundations. CNNs specialize in spatial feature extraction for images; RNNs model temporal dependencies in sequences; transformers replace recurrence with attention, aided by residual connections, normalization and efficient parallelism on GPUs.

Despite architectural differences, training remains end-to-end with backpropagation on large datasets, and the core view still holds: $Y = f (X; σ)$ is learned by composing data-dependent transformations with nonlinear activations. Generative AI builds on the same principles at greater scale. Large language models, diffusion models, VAEs and GANs learn distributions over data to synthesize text, images, audio and code.

The leap from a multilayer perceptron to state-of-the-art generators is primarily one of architecture, data and compute. Understanding activation functions, training requirements and the main types of networks provides a practical bridge from classical neural nets to today’s generative systems and clarifies why these models have become central to modern AI.

Author

Fangfang Lee

Developer Advocate

IBM

Unlock the power of generative AI and ML

Learn how to confidently incorporate generative AI and machine learning into your business.

Next step in your AI journey

What is a neural network?