What is vector search?

Young businesswoman standing by the window with mobile phone at startup office

Authors

Meredith Syed

Technical Content, Editorial Lead

IBM

Erika Russi

Data Scientist

IBM

What is vector search?

Vector search is a search technique used to find similar items or data points, typically represented as vectors, in large collections. Vectors, or embeddings, are numerical representations of words, entities, documents, images or videos. Vectors capture the semantic relationships between elements, enabling effective processing by machine learning models and artificial intelligence applications.

Vector search vs. traditional search

In contrast to traditional search, which typically uses keyword search, vector search relies on vector similarity search techniques like k-nearest neighbor search (knn) to retrieve data points similar to a query vector based on some distance metric. Vectors capture semantic relationships and similarities between data points, enabling semantic search instead of simple keyword search.

To illustrate the difference between traditional keyword and vector search, let’s go through an example. Say you are looking for information on the best pizza restaurant and you search for “best pizza restaurant” in a traditional keyword search engine. The keyword search looks for pages that contain the exact words “best”, “pizza” and “restaurant” and only returns results like “Best Pizza Restaurant” or “Pizza restaurant near me”. Traditional keyword search focuses on matching the keywords rather than understanding the context or intent behind the search.

By contrast, in a semantic vector search, the search engine understands the intent behind the query. Semantic, by definition, means relating to meaning in language, that is, semantic search understands the meaning and context of a query. In this case, it would look for content that talks about top-rated or highly recommended pizza places, even if the exact words “best pizza restaurant” are not used in the content. The results are more contextually relevant and might include articles or guides that discuss high quality pizza places in various locations.

Traditional search methods typically represent data using discrete tokens or features, such as keywords, tags or metadata. As shown in our example above, these methods rely on exact matches to retrieve relevant results. By contrast, vector search represents data as dense vectors (a vector in which most or all of the elements are non-zero) in a continuous vector space, the mathematical space in which data is represented as vectors. Each dimension of the dense vector corresponds to a latent feature or aspect of the data, an underlying characteristic or attribute that is not directly observed but is inferred from the data through mathematical models or algorithms. These latent features capture the hidden patterns and relationships in the data, enabling more meaningful and accurate representations of items as vectors in a high-dimensional space.

Diagram demonstrating the tokenization of a sentence via vector search

Sentence vectors

Traditional search methods may struggle with scalability for large datasets or high-dimensional data due to computational and memory constraints. By contrast, vector embeddings are easier to scale to larger datasets and more complex models. Unlike sparse representations of data where most of the values are zeros across dimensions, embeddings are dense vector representations having non-zero values in most dimensions. This allows vector embeddings to store more information in a smaller, lower-dimensional space, requiring less memory.¹As a result, machine learning algorithms and models can use embeddings more efficiently with fewer compute resources.

For example, Wikimedia Deutschland used vector search to make Wikidata’s 120-million-entry knowledge graph queryable by LLMs. Built on DataStax Astra DB on IBM watsonx.data, the solution delivered query speeds 30 times faster than local computation across 300 languages.

Vectorization process

For this explainer, we will focus on the vector representations applicable under natural language processing (NLP), that is, vectors that represent words, entities or documents.

We will illustrate the vectorization process by vectorizing a small corpus of sentences: “the cat sat on the mat”, “the dog played in the yard” and “birds chirped in the trees”.

The first step to building vector embeddings is to clean and process the raw dataset. This may involve the removal of noise and standardization of the text. For our example, we won’t do any cleaning since the text is already cleaned and standardized.

Next, an embedding model is chosen to be trained on the dataset. The trained embedding model is used to generate embeddings for each data point in the dataset. For text data, popular open-source embedding models include Word2Vec, GloVe, FastText or pre-trained transformer-based models like BERT or RoBERTa².

For our example, we’ll use Word2Vec to generate our embeddings.

Diagram demonstrating the vectorization for the word "in"

Vector for the word "in"

Diagram demonstrating the vectorization for the word "the"

Vector for the word "the"

Diagram demonstrating the vectorization for the word "trees"

Vector for the word "trees"

Next, the embeddings are stored in a vector database or a vector search plugin for a search engine, like Elasticsearch, is used. In vector search, relevance of a search result is established by assessing the similarity between the query vector, which is generated by vectorizing the query, and the document vector, which is a representation of the data being queried. Indexes need to be created in the vector database to enable fast and efficient retrieval of embeddings based on similar queries. Techniques such as hierarchical navigable small world (HNSW) can be used to index the embeddings and facilitate similarity search at query time. HNSW organizes the dataset and enables rapid search for nearest neighbors by clustering similar vectors together during the index construction process.

Finally, a mechanism or procedure to generate vectors for new queries must be established. This typically involves creating an API or service that takes user search queries as input in real-time, processes it using the same vector model and generates a corresponding vector representation. This vector can then be used to search on the database to get the most relevant results.

Finding similarity with distance measurements and ANN algorithms

In vector search, relevance is determined by measuring the similarity between query and document vectors. To compare two vectors against each other and determine their similarity, some distance measurement may be used, such as Euclidean distance or cosine similarity³.

Euclidean distance

Euclidean distance is a measure of the straight-line distance between two points. It is calculated as the square root of the sum of the squared differences between the corresponding coordinates of the two points.

Euclidean distance formula

This formula can be extended to higher-dimensional spaces by adding more terms to account for additional dimensions.

Cosine similarity

Cosine similarity is a measure of similarity between two vectors in a multi-dimensional space. It calculates the cosine of the angle between the two vectors, indicating how closely the vectors align with each other.

Mathematically, the cosine similarity, cos(θ), between two vectors is calculated as the dot product of the two vectors divided by the product of their magnitudes.

Cosine similarity formula

Cosine similarity ranges from -1 to 1, where:

1 indicates that the vectors are perfectly aligned (pointing in the same direction),
0 indicates that the vectors are orthogonal (perpendicular to each other) and
-1 indicates that the vectors are pointing in opposite directions.

Cosine similarity is particularly useful when dealing with vectors, as it focuses on the directional relationship between vectors rather than their magnitudes.

Approximate-nearest neighbor (ANN)

Although the distance metrics mentioned previously can be used to measure vector similarity, it becomes inefficient and slow to compare all possible vectors against the query vector at query time for similarity search. To solve for this, we can use an approximate-nearest neighbor (ANN) search.

Instead of finding an exact match, ANN algorithms efficiently search for the vectors that are approximately closest to a given query based on some distance metric like Euclidean distance or cosine similarity. By allowing for some level of approximation, these algorithms can significantly reduce the computational cost of nearest neighbor search without the need to compute embedding similarities across an entire corpus.

One of the most popular ANN algorithms is HNSW graphs. The hierarchical navigable small world graph structure indexes the dataset and facilitates fast search for nearest neighbors by grouping similar vectors together as it builds the index. HNSW organizes data into neighborhoods, linking them with probable connections. When indexing a dense vector, it identifies the suitable neighborhood and its potential connections, storing them in a graph structure. During an HNSW search with a dense vector query, it locates the optimal neighborhood entry point and returns the nearest neighbors.

Applications of vector search

Vector search has numerous use cases across domains due to its ability to efficiently retrieve similar items based on their vector representations. Some common applications of vector search include:

Information retrieval

Vector search is used in search engines to retrieve documents, articles, web pages or other textual content based on their similarity to a query. It enables users to find relevant information even if the exact terms used in the query are not present in the documents.

Retrieval Augmented Generation (RAG)

Vector search is instrumental in the Retrieval Augmented Generation (RAG) framework for retrieving relevant context from a large corpus of text. RAG is a framework for generative AI that combines vector search with generative language models to generate responses.

In traditional language generation tasks, large language models (LLMs) like OpenAI’s GPT (Generative Pre-trained Transformer) or IBM’s Granite Models are used to construct responses based on the input prompt. However, these models may struggle to produce responses that are contextually relevant, factually accurate or up to date. RAG addresses this limitation by incorporating a retrieval step before response generation. During retrieval, vector search can be used to identify contextually pertinent information, such as relevant passages or documents from a large corpus of text, typically stored in a vector database. Next, an LLM is used to generate a response based on the retrieved context.

Beyond language generation, RAG and vector search have further applications in various other NLP tasks, including question answering, chatbots, summarization and content generation.

Hybrid search

Vector search can be integrated into hybrid search approaches to enhance the effectiveness and flexibility of the search process. Hybrid search combines vector search with other search techniques, such as keyword-based search or metadata-based search. Vector search may be used to retrieve items based on their similarity to a query, while other search methods may be used to retrieve items based on exact matches or specific criteria.

Video and image search

Vector stores are used in image and video search engines to index and retrieve visual content based on similarity. Image and video embeddings are stored as vectors, enabling users to search for visually similar images or videos across large datasets.

Recommendation systems

Recommendation engines in streaming services as well as e-commerce, social media and visual media platforms can be powered by vector search. Vector search allows for the recommendation of products, movies, music or other items based on their similarity to items that users have interacted with or liked previously.

Geospatial analysis

Vector search is used in geospatial data applications to to retrieve spatial data such as points of interest, geographic features or spatial trajectories based on their proximity or similarity to a query location or pattern. It enables efficient spatial search and analysis in geographic information systems and location-based services.

The latest AI News + Insights  

Discover expertly curated insights and news on AI, cloud and more in the weekly Think Newsletter.

Mixture of Experts | 19 June, episode 112

Listen for weekly AI news & analysis

Hear from industry experts on the latest in AI news, listen to Mixture of Experts podcast. New episodes on Fridays at 6am EST.

Go to episodes

Start realizing ROI: A practical guide to agentic AI

Learn how to scale agentic AI for measurable ROI across your enterprise. This playbook outlines the top barriers that limit impact, how to effectively measure ROI and a practical framework to drive successful, enterprise-wide adoption.

Resources

Designing an AI native airline at enterprise scale

When margins are thin, every inefficiency matters. While legacy systems continue to constrain AI’s potential across aviation, Riyadh Air chose a different path. In partnership with IBM, Riyadh Air built the world’s first AI‑native airline, redefining a smarter, faster, more intuitive way to travel.

The enterprise in 2030: Engineered for perpetual innovation

Discover our five predictions about what will define the most successful enterprises in 2030 and the steps leaders can take to gain an AI-first advantage.

Start realizing ROI: A practical guide to agentic AI

Discover ways to get ahead, successfully scaling AI across your business with real results.

Level up your AI expertise

Purchase an individual or multi-user subscription today to access our full catalog of over 100 online courses. Expand your skills across a wide range of our products at a low price.

From AI projects to profits: How agentic AI can sustain financial returns

Discover how organizations are moving from isolated AI pilots to driving core business transformation with agentic AI.

Explore IBM Granite

IBM Granite® is a family of open, high performance and trusted AI models designed for business and optimized to scale your AI applications. Explore options across language, code, time series and guardrails.

IBM AI Academy

Led by top IBM thought leaders, the curriculum is designed to help business leaders gain the knowledge needed to prioritize the AI investments that can drive growth.

Unlock the power of generative AI and ML

Learn how to confidently incorporate generative AI and machine learning into your business.

How to thrive in this new era of AI with trust and confidence

Dive into the three critical elements of a strong AI strategy—creating a competitive edge, scaling AI across the business and advancing trustworthy AI.

Footnotes

¹Bahaaldine Azarmi and Jeff Vestal, Vector Search for Practitioners with Elastic, Packt Publishing, 2023

² Vicki Boykis, “What are embeddings,” 2023, https://vickiboykis.com/what_are_embeddings

³ Trey Grainger, Doug Turnbull and Max Irwin, AI Powered Search, Manning Publications, 2024