Vector search has become a cornerstone of semantic search pipelines that power retrieval-augmented applications. OpenSearch, running on watsonx.data®, provides a powerful and scalable platform for storing and querying high-dimensional embeddings. This approach enables applications to retrieve documents based on their meaning rather than exact keyword matches.
However, OpenSearch vector search results can sometimes contain noisy or loosely ranked results, especially when multiple documents are close in embedding space. This overlap can also occur when a search term is roughly similar to two or more documents. An example is a user query such as “updating my account”, which can refer to changing account settings or switching to a different credit card.
One effective way to refine these results is to introduce a lightweight Radial Basis Function Neural Network (RBFNN) to rerank candidate documents.
A Radial Basis Function Neural Network (RBFNN) is a feedforward artificial neural network that approximates a function or makes predictions. It does this process by representing the output as a linear combination of radial basis functions (RBFs) of the input vector and neuron parameters. An RBFNN can train and converge more quickly than a multilayer perceptron neural network with gradient descent and backpropagation because its architecture is simpler.
In a classification task, such as document reranking, input values are compared to centers in a higher-dimensional space in a hidden layer through a continuous function like a Gaussian function. Translating data points that can’t be separated in a lower-dimensional space into a higher-dimensional one makes them separable. The output layer then outputs the distance of an input point to a center, producing a prediction of the class to which that input point belongs.
In this tutorial, you’ll learn how to combine OpenSearch with IBM’s Granite® Embedding model to create a reranking pipeline. While this process is meant to mimic how results might be pulled from a data lake in watsonx.data, the tutorial runs everything locally for brevity.
Each document description is stored as vector embeddings in OpenSearch and retrieved through a K-Nearest Neighbors (KNN) search, which is built into the OpenSearch platform. The RBFNN then reranks the top results so that the most relevant result appears first.
The RBFNN consists of a linear classifier that uses a radial basis function activation function. This setup means that the model learns a decision boundary to separate the topics of one document from all the others.
Unlike in a KNN, the RBF function can handle complex and non-contiguous boundaries for a single document. When one document contains several topics that might overlap with other documents, the RBF can still correctly select the document because an RBF can learn multiple centers for a single class.
This approach allows you to create a reranker as a powerful post-processing layer on top of OpenSearch vector queries. The reranker takes the raw vector distances from OpenSearch and reorders them so that the most relevant result appears first.
The result is a simple but powerful reranking strategy that leverages the rich embeddings from Granite and enhances the precision of semantic search from OpenSearch.
There are a few steps to set up the environment that can replicate what you would build in watsonx®. First, create a Python virtual environment by using either UV or PIP. Then, install the Python libraries needed for this tutorial:
To create the vector database, you need a way to generate embeddings. This tutorial uses the Granite Embeddings model to generate embeddings for documents and user queries. Next, install Ollama on your local system following the instructions here. Once you’ve installed it, start Ollama and install the Granite Embeddings model by entering the following command in a terminal or command prompt window:
Now Ollama is installed locally and the Granite Embeddings model is ready to use. When using this same model on watsonx.data, you wouldn’t use Granite Embeddings. However, this setup allows you to host the model locally for testing and prototyping.
Next, follow these docs to install OpenSearch in your preferred configuration. There are several paths to do this task, the easiest being by using Docker. That process can also be accomplished by using Podman or Homebrew. Once you’ve installed it, start OpenSearch with the following command.
The
Now you have a way to run embedding models locally and to serve vectors with OpenSearch. When working on watsonx.data, you would typically have a data lake set that provides supplementary data, but the vector database side of the setup would be identical.
Now it’s time to create embeddings for the mock documents that you’ll be matching to user queries:
Now, generate embeddings with the Granite Embeddings model through Ollama. The embed function passes a vector of queries to Ollama to have Granite Embeddings generate a vector of 384 floating point numbers that represent the tokens of each text.
Now it’s time to put those embeddings into OpenSearch. First, create an OpenSearch Python client with the password that you configured earlier.
Now you’ll load the document embeddings into OpenSearch to persist them. These lines:
configure the vectors to be 384 floats long because that’s what the Granite Embeddings model generates. They also tell OpenSearch to configure the vectors for KNN searches. This process is how the initial search to retrieve matching vectors will be performed.
Now you’ll load the embeddings into OpenSearch:
Now all the embeddings have been created and loaded in OpenSearch, it’s time to build an RBFNN to process the results that OpenSearch will return.
An RBFNN is a neural network that consists of a single layer. For this tutorial that network should predict which document center the user query is closest to in the embedding space. To create that the
for a set of centers:
This method creates a feature vector like so: that calculates a feature vector for each center.
Now you’ll create training data for the model by combining queries like “change password” with document embeddings. This step helps the model map a query about passwords to a document like “Reset your account password from the settings page”. It’s important to have both negative and positive examples for each query, that is, which documents the query is like and which it isn’t. All the queries are turned into embeddings with the Granite Embeddings model:
Now you’re ready to train the RBFNN.
Now that you have a training function and training data, it’s time to train the model. The queries and documents are concatenated into a matrix, which is used to generate candidate centers. Those centers are generated with the k-means clustering algorithm and are then passed to the
That function creates a feature vector for each center. Each neuron is a feature center in the hidden layer of the RBF network.
This network is then passed to a
Now the RBFNN is trained.
Now that the model has been trained, it’s time to test it. A vague query can make for a difficult match.
Searching in OpenSearch does return the correct result, but the KNN matching inside OpenSearch isn’t confident about which document is correct.
This will output:
We can see that “Update your credit card under payment methods.” gets a 0.00078, while “Cancel your subscription by visiting billing settings.” gets a 0.00066. To improve this process, use the reranker to get more confidence about the correct document. Get the embedding for each document returned from OpenSearch:
Then, prepare to test the query against each candidate document. This step gives the model a possible center and possible query to compare through the generalizations that it learned from training samples:
Now sort the results by the scores and print the highest scoring result:
This will output:
You can see that the reranker is much more confident about the correct document for the query with a score of 0.11 for the best fitting document and 0.2 for the next highest ranking.
The RBF layer works effectively here because it can learn targeted semantic interactions between query and document embeddings without needing a large and complex ANN to rerank. In a production machine learning scenario, the speed with which a reranker can be trained and infer data is invaluable for real-world search and data management.
Get answers you can trust with context-aware AI agents powered by governed and connected data—without replatforming or lock-in.
Design a data strategy that eliminates data silos, reduces complexity and improves data quality for exceptional customer and employee experiences.
Successfully scale AI with the right strategy, data, security and governance in place.