What is a Radial Basis Function?

By Joshua Noble

What is a Radial Basis Function?

A radial basis function (RBF) is a widely used algorithm in artificial intelligence and machine learning to measure similarity between points based only on distance from a center point.

The name “radial basis function” comes from two features. “Radial” because they deal with the radial distance from a center point, although in possibly many more dimensions than just two. “Basis” in that they are building blocks for creating more complex functions by adding or multiplying many smaller, simpler functions together. They were first introduced in a 1988 paper by D. S. Broomhead and D. Lowe.¹

RBFs allow a data scientist to quickly estimate how complex and nonlinear data, especially in higher-dimensional spaces where other statistical methods often struggle. In scenarios with relatively little data or with nonlinear boundaries around clusters of local similarity, an RBF can quickly find both the centers of clusters and boundaries between clusters.

An RBF is guaranteed to be a real-valued function, so it will always return a real-valued result. This RBF approach makes it numerically stable where other approaches might not be. An RBF model is a linear model but because it’s a linear combination of multiple radial basis functions, it is able to model nonlinear relationships. This combination makes it both highly interpretable and valuable with complex multivariate data.

RBFs can even help identify classes where the boundary of a class has multiple disconnected regions and centers. Many kinds of linear models have difficulty identifying classes like this because they produce a single straight boundary. This characteristic makes the RBF a powerful tool for data analysis and exploration because they can be trained and run so quickly.

A graphic showing two classes with disconnected regions

A class with disconnected regions can be estimated by an RBF

RBFs are widely used in various types of models such as support vector machines and neural networks (NNs) that use RBFs, known as radial basis function (RBF) networks. These networks offer several advantages over multilayer perceptron-based architectures.

An RBF-based model can learn nonlinear boundaries with fewer decisions required, there’s no need to choose the depth and the width of the network or select a learning rate. This architecture can often make training more stable and faster and higher performing with smaller datasets.

Unlike more traditional MLP-based neural networks, they don’t learn hierarchical features or scale well to large datasets.

An RBF often approximates functions, especially when data comes from an unknown but consistent process. This feature is helpful in modeling physical processes in engineering, for instance, temperature distribution inside a mechanical part. RBFs are also often used in signal processing, in determining the location of a signal emitter from multiple sensor readings.

RBF networks can also be effective in forecasting future values based on historical data. They are used to predict stock prices, generate or update weather forecasts and predict electricity consumption for a power grid. Many time-series analysis techniques struggle with nonlinear trends, unlike RBF networks, which make them especially useful in highly variable forecasting tasks.

RBF networks are also commonly used in pattern recognition and classification tasks. The network uses a radial basis function to map data points into a higher-dimensional space and separate classes effectively. This approach has applications in image classification, handwritten digit recognition and text categorization.

Finally, RBF networks are powerful techniques to perform anomaly detection to locate outliers or unusual patterns, especially those patterns formed by many features. For instance, detecting a fraudulent financial transaction depends on being able to analyze many features of the transaction at high speeds.

Shortcomings of RBFs

However, RBFs are not without their shortcomings. Classic RBF models often require many centers to achieve accurate clustering, sometimes even one center per training point. This limitation means computing many distances at inference time because each point is compared to a center. A traditional neural network that uses ReLU activation functions is simply matrix multiplication and this operation is faster on a model graphics processor unit (GPU).

RBFs can also struggle in high-dimensional spaces because they calculate the distance from a centroid. This design also makes them less powerful with hierarchical data such as vision-related tasks or natural language processing where deep stacks of ReLU activations can understand multiple types of relationships.

Data management and analysis

Despite these shortcomings in certain scenarios, RBFs are still widely used. In data management, their speed and efficacy with limited data makes them perfect for entity resolution and deduplication of records. A common problem in enterprise data storage is record linkage. We can have three different records that look like:

“Josh Noble, Buenos Aires”
“Joshua Noble, Argentina”
“J Noble, C1414 Cdad. Autónoma de Buenos Aires, Argentina”

These records should be linked so that they resolve to the same entity, but string-matching techniques that rely on hard threshold will likely perform poorly. An RBF-based approach allows us to represent each record as a feature vector (or embedding) and compute:

$e x p (- γ ∥ x - y ∥^{2})$

Where $γ$ is a parameter that says how quickly similarity falls off, and $x$ and $y$ are the distances between two records. This method produces a smooth similarity score. The closer the result is to 1, the more similar the two entities are, and the closer the result is to 0, the more different they are.

This approach is more effective than a hard threshold because it provides a more smoothed measure of similarity and distance.

Another common application of RBFs is creating soft filters that allow searches that model logic in the form of “products like this one” or “transactions similar to these two”. RBFs can create soft versions of these filters:

$s o f t f i l t e r (x) = e x p (- γ ∥ x - c ∥^{2})$

Where $x$ is the candidate and $c$ is the center of a new cluster. The distance can use it as a ranking score for the top $N$ results, making it easy to get the top 20 matches.

Kernel functions

To understand why RBFs are so powerful requires some background in kernels functions. Kernels have a pivotal role in machine learning, especially in algorithms designed for classification and regression tasks like support vector machines (SVMs) or Gaussian processes.

The kernel function is the heart of these algorithms to simplify the inherent complexity of real-world data. Kernels can transform nonlinear relationships into a linear format and make them accessible to algorithms that can otherwise primarily handle linear data.

That transformation is important for allowing models like SVMs to make sense of complex patterns. Kernels allow modeling complexity without the computational demands that mapping data to higher dimensions explicitly.

A kernel is a function that operates on two vectors from the input space, called ‘the X space’. This function returns a scalar value but importantly that the scalar is the dot product of the two input vectors: a center of a cluster and a prospective member of that cluster.

However, that output scalar is not computed in the original space of these vectors. Instead, the dot product is calculated in a much higher-dimensional space, known as ‘the Z space’.

This ability can convey how close these two vectors are in “the Z space”. It does so without mapping the vectors to this higher-dimensional space or calculating their dot product there.

A kernel provides information about the vectors in the more complex “Z space” without having to access the space directly. This approach is useful with models like SVMs, where understanding the relationship and position of vectors in a higher-dimensional space is crucial for classification tasks.

A kernel enables this through the “kernel trick”, a clever technique that allows algorithms to operate in a high-dimensional space without directly computing the coordinates in that space. It is called a “trick” because it circumvents the computationally intensive task of mapping data points into a higher-dimensional space, which is often necessary for making complex, nonlinear classifications.

RBF Kernels

One of the key applications of an RBF is the RBF kernel. This kernel maps input data into a high-dimensional feature space to enable nonlinear classification. Adding more dimensions makes it easier to separate the data points into clusters in a classification scenario.

If you’ve read about support vector machines, then this concept might be familiar to you. In practice, RBF kernels are widely used in SVMs because they handle nonlinear patterns that linear methods cannot capture.

The RBF kernel function for two points $X_{1}$ and $X_{2}$ computes the similarity or how close they are to each other. This kernel can be mathematically represented as follows:

$K (X_{1}, X_{2}) = e x p (- \frac{∥ X_{1} - X_{2} ∥^{2}}{2 σ^{2}})$

An RBF takes an input $x$ and a cente $c$ and outputs a value that depends on how far $x$ is from $c$ :

$ϕ (x, c) = ϕ (∥ x - c ∥)$

So if $x$ is close to $c$ , the output is large and if it’s far, the output is small.

Distance measures for RBFs

There are multiple functions that can be used to determine the distance from a cluster center to any specific point when learning the boundaries of a class. For instance:

Gaussian: $ϕ (r) = e x p (- c r^{2})$ is the default distance function and it assumes that the influence of a center is like a gaussian distribution that extends a short distance but then quickly falls off toward zero. The influence of a center is more local with a Gaussian distance.

Linear: $ϕ (r) = c r$ , this performs well when the data is highly scattered and a simple geometric interpolation captures its structure.

Cubic: $ϕ (r) = (r + c)^{3}$ , helpful to model stronger long-range influences than linear or Gaussian distances.

Thin plate spline: $ϕ (r) = r^{2} l o g (c r^{2})$ , useful when the data is taken from a 2D space, like in geo-spatial or time series contexts.

Multi-quadratic: $ϕ (r) = (r^{2} + c^{2})^{\frac{1}{2}}$ , helpful when the model needs to assign a center to all data points that are using relatively few centers. This approach is not as commonly used because it can be numerically unstable.

Inverse multi-quadratic: $ϕ (r) = \frac{1}{\sqrt{1 + (ϵ r)^{2}}}$ , inverting the multi-quadric RBF creates a decays as distance increases. This method offers better stability when the distance must be highly precise and smooth, that is, differentiable at all points. This result is important in engineering and finance applications where having accurate gradients is essential.

The different types of distances show the versatility of the RBF and how selecting the distance can fit data and approximate functions in highly targeted and precise ways.

RBF networks

A common application of RBFs is the RBF network.² These networks use RBF neurons in a hidden layer to output predictions about input data. This structure gives an RBF network a three-layer architecture consisting of an input layer, a hidden layer of RBFs as neurons and an output layer.

Unlike other network architectures, there are no “deeper” ways of structuring an RBF network. This limitation means that they always remain relatively simple but also are faster and easier to train in many situations.

Conceptually, RBF networks are similar to K-Nearest Neighbor (KNN) models. They assume that points within a cluster behave in a similar manner, but the methods differ in several important ways.

For a point $x$ , a KNN finds the $k$ closest training points and predicts the class for $x$ through a vote or a regression average. A KNN “learns” little about the data, it simply stores the data and a distance metric that serves as the center of the cluster.

An RBF, by contrast, computes a smoothed similarity to many (or all) centers and combines them with weights that it learns about the centers and the boundaries of the clusters.

RBF networks have four fundamental elements:

1. The vector input is what the network receives. It’s usually an n-dimensional input vector that requires classification or a regression.

2. RBF neurons represent a prototype vector from the training set as a neuron in a hidden layer. The network computes a distance between the input vector and each neuron’s estimated center. This distance is often a Euclidean distance but as shown in the previous section of this explainer, it can use other kinds of distance metrics as well.

3. Activation function: The Euclidean distance is transformed by using a radial basis function (typically a Gaussian function) to compute the neuron’s activation value. This value decreases exponentially as the distance increases.

4. Output nodes calculate a score based on a weighted sum of the activation values from all RBF neurons. For classification, the category with the highest score is chosen.

An RBF network differs from a standard feed-forward artificial neural network in a few key ways. In an RBF network, changing the weights for one center does not affect other centers, whereas in a feed-forward ANN, changing any weight affects predictions everywhere. This behavior makes RBF networks a good choice when locality is important, the task requires interpretability and each training iteration can update an individual center rather than the entire model.

Feed-forward NNs, by contrast, can model higher-dimensional data like images and text and provide an activation function for every neuron.

RBF networks can offer some significant advantages over other kinds of models. They can perform what is referred to as universal approximation, in that they can approximate any continuous function with any required accuracy, when they are given enough neurons. They also typically offer faster learning compared to other neural network architectures.

Standard cross-validation techniques help prevent over-fitting and determine the correct number of basis functions and the width of those functions. Because RBF networks can easily overfit by fitting noise, cross-validation helps balance bias and variance. Because the architecture is simple, RBF networks are easier to implement from scratch and interpret than more complex architectures.

Gaussian processes and ridge regression

Kernel ridge regression (KRR) with a radial basis function (RBF) kernel is a powerful, nonlinear regression technique. It combines the function-approximating capabilities of an RBF network with the regularization benefits of ridge (L2) regression. That means KRR can model a highly nonlinear data-generating process by applying ridge regression in the feature space by using regularization that often creates a smaller model less likely to overfit data.

This approach is used to predict continuous numeric outputs by mapping inputs into a high-dimensional space where linear regression operates effectively. The combination of nonlinear transformation and regularization provides models that are both interpretable and quick to train.

The prediction for a new input is a weighted sum of the similarities between and all training points, making it a non-parametric model. Being non-parametric, they make fewer assumptions about the original data distribution making them better suited for complex, nonlinear relationships between variables. They are also highly effective for small samples, ordinal data and datasets containing outliers, making them helpful in retrieving and filtering data.

Another model where RBF kernels are widely used is in Gaussian processes (GPs). GPs are non-parametric Bayesian modeling tools that provide both predictions and uncertainty quantification. They perform well on smaller datasets and provide clear output of how certain any specific prediction is, which is invaluable in high-stakes situations.

The RBF kernel is a perfect default assumption for the kind of functions a GP models because they provide smooth, continuous functions where nearby inputs have similar outputs.

In GPs that use an RBF, the prediction becomes essentially a form of RBF-weighted averaging with uncertainty. The mean prediction is a smooth weighted combination of training outputs, where weights depend on RBF similarity. The variance is lower when the input data is closer to a mass of training data and higher when it’s further away. The RBF kernel is the default in many GP libraries for Python such as Scikit-Optimize and GPyTorch.

Joshua Noble

Data Scientist

Footnotes

1.Broomhead, D. S.& Lowe, D. (1988). Multivariable functional interpolation and adaptive networks. Complex Systems, 2, 321-355

2. Orr, Mark (1996). Introduction to Radial Basis Function Networks. https://faculty.cc.gatech.edu/~isbell/tutorials/rbf-intro.pdf

3D render of several icons lined up such as a camera, volume knob and a clipboard

Learn why the path to AI-ready data starts with effective access to both structured and unstructured data.

Resources

Abstract 3D illustration of stacked, colorful circular and rectangular layers

Unify and access your data

Learn why the path to AI-ready data often starts with access to both structured and unstructured data.

Two businesswomen engaged in a discussion at a modern office table

Shorthills

Learn how Shorthills AI scaled legal search, improving accuracy, recall and speed with enterprise-grade RAG

The hybrid, open data lakehouse for AI and analytics

Discover the power of integrating a data lakehouse strategy into your data architecture, including cost-optimizing your workloads and scaling AI and analytics, with all your data, anywhere.

Massive and multilingual

See how Wikipedia’s transformed its vast data into structured, accessible knowledge for building smarter AI and LLM applications.

Abstract 3D composition of floating colorful disks, cards, and device-like shapes

Unleash the power of AI for seamless data integration

Discover how a unified, AI-powered data integration approach can help you move faster, reduce complexity, and unlock the full potential of your data

Floating 3D icons and objects, including camera, envelope, paper plane, and colored disks

AI-powered enterprise search

Build the next generation of AI agent-powered applications.

Two workers having a meeting in a modern office

Talk to your data room

Learn how Bay Point Advisors built a secure RAG platform, enabling trusted insights and faster analysis of private market data.

A breathtaking view of Midtown Manhattan, NYC

From contract headache to strategic asset

Learn how Edsvard uses AI to analyze unstructured real estate data, uncover insights, and reduce risk.