Using Radial Basis Functions to Improve RAG Search

Improving RAG Results

Vector search has become a cornerstone of semantic search pipelines that power retrieval-augmented applications. OpenSearch, running on watsonx.data®, provides a powerful and scalable platform for storing and querying high-dimensional embeddings. This approach enables applications to retrieve documents based on their meaning rather than exact keyword matches.

However, OpenSearch vector search results can sometimes contain noisy or loosely ranked results, especially when multiple documents are close in embedding space. This overlap can also occur when a search term is roughly similar to two or more documents. An example is a user query such as “updating my account”, which can refer to changing account settings or switching to a different credit card.

One effective way to refine these results is to introduce a lightweight Radial Basis Function Neural Network (RBFNN) to rerank candidate documents.

A Radial Basis Function Neural Network (RBFNN) is a feedforward artificial neural network that approximates a function or makes predictions. It does this process by representing the output as a linear combination of radial basis functions (RBFs) of the input vector and neuron parameters. An RBFNN can train and converge more quickly than a multilayer perceptron neural network with gradient descent and backpropagation because its architecture is simpler.

In a classification task, such as document reranking, input values are compared to centers in a higher-dimensional space in a hidden layer through a continuous function like a Gaussian function. Translating data points that can’t be separated in a lower-dimensional space into a higher-dimensional one makes them separable. The output layer then outputs the distance of an input point to a center, producing a prediction of the class to which that input point belongs.

In this tutorial, you’ll learn how to combine OpenSearch with IBM’s Granite® Embedding model to create a reranking pipeline. While this process is meant to mimic how results might be pulled from a data lake in watsonx.data, the tutorial runs everything locally for brevity.

Each document description is stored as vector embeddings in OpenSearch and retrieved through a K-Nearest Neighbors (KNN) search, which is built into the OpenSearch platform. The RBFNN then reranks the top results so that the most relevant result appears first.

The RBFNN consists of a linear classifier that uses a radial basis function activation function. This setup means that the model learns a decision boundary to separate the topics of one document from all the others.

Unlike in a KNN, the RBF function can handle complex and non-contiguous boundaries for a single document. When one document contains several topics that might overlap with other documents, the RBF can still correctly select the document because an RBF can learn multiple centers for a single class.

This approach allows you to create a reranker as a powerful post-processing layer on top of OpenSearch vector queries. The reranker takes the raw vector distances from OpenSearch and reorders them so that the most relevant result appears first.

The result is a simple but powerful reranking strategy that leverages the rich embeddings from Granite and enhances the precision of semantic search from OpenSearch.

Prerequisites

There are a few steps to set up the environment that can replicate what you would build in watsonx®. First, create a Python virtual environment by using either UV or PIP. Then, install the Python libraries needed for this tutorial:

%pip install opensearch-py requests numpy scikit-learn

To create the vector database, you need a way to generate embeddings. This tutorial uses the Granite Embeddings model to generate embeddings for documents and user queries. Next, install Ollama on your local system following the instructions here. Once you’ve installed it, start Ollama and install the Granite Embeddings model by entering the following command in a terminal or command prompt window:

ollama pull granite-embedding

Now Ollama is installed locally and the Granite Embeddings model is ready to use. When using this same model on watsonx.data, you wouldn’t use Granite Embeddings. However, this setup allows you to host the model locally for testing and prototyping.

Next, follow these docs to install OpenSearch in your preferred configuration. There are several paths to do this task, the easiest being by using Docker. That process can also be accomplished by using Podman or Homebrew. Once you’ve installed it, start OpenSearch with the following command.

docker run -d -p 9200:9200 -p 9600:9600 -e “discovery.type=single-node” -e “OPENSEARCH_INITIAL_ADMIN_PASSWORD=<your strong password> opensearchproject/opensearch:latest

The OPENSEARCH_INITIAL_ADMIN_PASSWORD parameter must be passed to the container to properly start OpenSearch. OpenSearch requires a strong password for startup.

Now you have a way to run embedding models locally and to serve vectors with OpenSearch. When working on watsonx.data, you would typically have a data lake set that provides supplementary data, but the vector database side of the setup would be identical.

Creating embeddings

Now it’s time to create embeddings for the mock documents that you’ll be matching to user queries:

docs = [
“Reset your account password from the settings page.”,
“Cancel your subscription by visiting billing settings.”,
“Update your credit card under payment methods.”,
“Troubleshoot internet connectivity problems.”,
“Restart your router to fix connection issues.”
]

Now, generate embeddings with the Granite Embeddings model through Ollama. The embed function passes a vector of queries to Ollama to have Granite Embeddings generate a vector of 384 floating point numbers that represent the tokens of each text.

import requests

OLLAMA_URL = “http://localhost:11434/api/embeddings”
MODEL = “granite-embedding”

def embed(texts):
vectors = []

for t in texts:
response = requests.post(
OLLAMA_URL,
json={
“model”: MODEL,
“prompt”: t
}
)

vectors.append(response.json()[“embedding”])

return vectors

doc_embeddings = embed(docs)

Now it’s time to put those embeddings into OpenSearch. First, create an OpenSearch Python client with the password that you configured earlier.

from opensearchpy import OpenSearch

auth = (‘admin’, ‘<your strong password>‘)

client = OpenSearch(
hosts=[{“host”: “localhost”, “port”: 9200}],
http_compress=True,
http_auth=auth,
use_ssl=True,
verify_certs=False,
ssl_assert_hostname=False,
ssl_show_warn=False
)

Now you’ll load the document embeddings into OpenSearch to persist them. These lines:

“type”: “knn_vector”,
“dimension”: 384

configure the vectors to be 384 floats long because that’s what the Granite Embeddings model generates. They also tell OpenSearch to configure the vectors for KNN searches. This process is how the initial search to retrieve matching vectors will be performed.

index_name = “docs”

index_body = {
“settings”: {
“index”: {
“knn”: True # this tells OpenSearch to create KNN compatible indices internally.
}
},
“mappings”: {
“properties”: {
“text”: {“type”: “text”},
“embedding”: {
“type”: “knn_vector”,
“dimension”: 384
}
}
}
}

client.indices.create(index=index_name, body=index_body)

Now you’ll load the embeddings into OpenSearch:

for i, (doc, emb) in enumerate(zip(docs, doc_embeddings)):

body = {
“text”: doc,
“embedding”: emb
}

client.index(
index=index_name,
id=i,
body=body
)

Now all the embeddings have been created and loaded in OpenSearch, it’s time to build an RBFNN to process the results that OpenSearch will return.

RBFNN

An RBFNN is a neural network that consists of a single layer. For this tutorial that network should predict which document center the user query is closest to in the embedding space. To create that therbf_features function computes:

$ϕ_{j} (x) = e x p (- γ ∥ x - c_{j} ∥^{2})$

for a set of centers:

$c_{1}, c_{2}, c_{3} . . . c_{k}$

This method creates a feature vector like so: $(x) = [1 (x), 2 (x) \dots_{k} (x)]$ that calculates a feature vector for each center.

import numpy as np

# a smaller gamma risks overfitting, a wider one may not be able to differentiate training samples
def rbf_features(X, centers, gamma=1e-6):

features = []
# gaussian rbf for model to learn interpolation between multiple samples and center
for x in X:
row = []
for c in centers:
dist = np.linalg.norm(x - c)
row.append(np.exp(-gamma * dist**2))
features.append(row)

return np.array(features)

Now you’ll create training data for the model by combining queries like “change password” with document embeddings. This step helps the model map a query about passwords to a document like “Reset your account password from the settings page”. It’s important to have both negative and positive examples for each query, that is, which documents the query is like and which it isn’t. All the queries are turned into embeddings with the Granite Embeddings model:

from sklearn.linear_model import LinearRegression
from sklearn.cluster import KMeans

train_queries = [
“change password”,
“can’t connect to wifi”,
“reset username and password”,
“change credit card”,
“update my payment”,
“fix internet problem”,
“cancel my account”
]

# output values for training: queries mapped to positive document indices
pos_map = {
“change password”: [0],
“can’t connect to wifi”: [3, 4],
“reset username and password”: [0],
“change credit card”: [2],
“update my payment”: [2],
“fix internet problem”: [3, 4],
“cancel my account”: [1]
}

# Create training pairs
train_pairs = []
train_labels = []

for query in train_queries:
pos_docs = pos_map[query]
for i, doc in enumerate(docs):
label = 1 if i in pos_docs else 0
train_pairs.append((query, doc))
train_labels.append(label)

y_train = np.array(train_labels)

# Generate embeddings
queries = [q for q, d in train_pairs]
docs = [d for q, d in train_pairs]

query_emb = np.array(embed(queries))
doc_emb = np.array(embed(docs))

Now you’re ready to train the RBFNN.

Training the RBFNN

Now that you have a training function and training data, it’s time to train the model. The queries and documents are concatenated into a matrix, which is used to generate candidate centers. Those centers are generated with the k-means clustering algorithm and are then passed to the rbf_features function. Although this example uses k-means, other unsupervised learning clustering techniques are often used as well.

That function creates a feature vector for each center. Each neuron is a feature center in the hidden layer of the RBF network.

This network is then passed to a LinearRegression instance, which will learn to predict which features are closest to a center. Each document is associated with multiple centers, so the model can associate multiple non-contiguous input spaces with each document, giving it more power.

# Concatenate query + doc embeddings for RBF input into a training set
X_train = np.concatenate([query_emb, doc_emb], axis=1) # shape (num_pairs, 2*embedding_dim)

n_centers = X_train.shape[0]

kmeans = KMeans(n_clusters=n_centers, random_state=42)
# calculate euclidean distance centers for all examples in the training dataset
centers = kmeans.fit(X_train).cluster_centers_

# Compute RBF features for all centers
Phi_train = rbf_features(X_train, centers, gamma=1e-6)

# train with Linear Regression as the learning algorithm, could swap for Ridge for regularization
reranker = LinearRegression()
reranker.fit(Phi_train, y_train)

Now the RBFNN is trained.

Testing the RBFNN

Now that the model has been trained, it’s time to test it. A vague query can make for a difficult match.

change_card = “how do I change how I pay?”
change_card_vector = embed([change_card])[0] # take just the first embedding vector from Ollama

query_vec = change_card_vector

Searching in OpenSearch does return the correct result, but the KNN matching inside OpenSearch isn’t confident about which document is correct.

k = 6

search_query = {
“size”: k,
“query”: {
“knn”: {
“embedding”: {
“vector”: query_vec,
“k”: k
}
}
}
}

response = client.search(index=index_name, body=search_query)

for hit in response[‘hits’][‘hits’]:
print(f”{hit[‘_score’]} : {hit[‘_source’][‘text’]}”)

This will output:

0.0007828022 : Update your credit card under payment methods.
0.00066259573 : Cancel your subscription by visiting billing settings.
0.0005574707 : Reset your account password from the settings page.
0.0004300855 : Troubleshoot internet connectivity problems.
0.00042786854 : Restart your router to fix connection issues.

We can see that “Update your credit card under payment methods.” gets a 0.00078, while “Cancel your subscription by visiting billing settings.” gets a 0.00066. To improve this process, use the reranker to get more confidence about the correct document. Get the embedding for each document returned from OpenSearch:

candidates = []
candidate_embeddings = []

for hit in response[“hits”][“hits”]:
candidates.append(hit[“_source”][“text”])
candidate_embeddings.append(hit[“_source”][“embedding”])

Then, prepare to test the query against each candidate document. This step gives the model a possible center and possible query to compare through the generalizations that it learned from training samples:

# create an input vector for the model
query_vec = np.asarray(query_vec).reshape(1, -1)
candidate_embeddings = np.asarray(candidate_embeddings)

query_repeat = np.repeat(query_vec, candidate_embeddings.shape[0], axis=0)
X_test = np.concatenate([query_repeat, candidate_embeddings], axis=1)

# function approximation of phi for all centers
Phi_test = rbf_features(X_test, centers)

# pass the network output to the reranking linear model
scores = reranker.predict(Phi_test)

Now sort the results by the scores and print the highest scoring result:

ranking = sorted(
zip(candidates, scores),
key=lambda x: -x[1]
)

# print the score metric
for doc, score in ranking:
print(score, doc)

This will output:

0.11656583034772439 Update your credit card under payment methods.
0.028168299716242018 Cancel your subscription by visiting billing settings.
0.02339543498118246 Reset your account password from the settings page.
0.0221207986530203 Troubleshoot internet connectivity problems.
0.020085314202049176 Restart your router to fix connection issues.

You can see that the reranker is much more confident about the correct document for the query with a score of 0.11 for the best fitting document and 0.2 for the next highest ranking.

The RBF layer works effectively here because it can learn targeted semantic interactions between query and document embeddings without needing a large and complex ANN to rerank. In a production machine learning scenario, the speed with which a reranker can be trained and infer data is invaluable for real-world search and data management.