Updated: 29 July 2024
Contributors: Jim Holdsworth, Matthew Kosinski
Vector databases are growing in popularity because they deliver the speed and performance needed to drive generative artificial intelligence (AI) use cases and applications. According to Gartner®, by 2026, more than 30% of enterprises will have adopted vector databases to build their foundation models with relevant business data.1
Unlike traditional relational databases with rows and columns, data points in a vector database are represented by vectors with a fixed number of dimensions. Because they use high-dimensional vector embeddings, vector databases are better able to handle unstructured datasets.
The nature of data has undergone a profound transformation. It's no longer confined to structured information easily stored in traditional databases. Unstructured data—including social media posts, images, videos, audio clips and more—is growing 30% to 60% year over year.2
Relational databases excel at managing structured and semistructured datasets in specific formats. Loading unstructured data sources into a traditional relational database to store, manage and prepare the data for artificial intelligence (AI) is a labor-intensive process, especially with new generative use cases such as similarity search.
Traditional search typically represents data by using discrete tokens or features, such as keywords, tags or metadata. Traditional searches rely on exact matches to retrieve relevant results. For example, a search for "smartphone" would return results containing the word "smartphone."
Opposed to this, vector search represents data as dense vectors, which are vectors with most or all elements being nonzero. Vectors are represented in a continuous vector space, the mathematical space in which data is represented as vectors.
Vector representations enable similarity search. For example, a vector search for “smartphone” might also return results for “cellphone” and “mobile devices.”
Each dimension of the dense vector corresponds to a latent feature or aspect of the data. A latent feature is an underlying characteristic or attribute that is not directly observed but inferred from the data through mathematical models or algorithms.
Latent features capture the hidden patterns and relationships in the data, enabling more meaningful and accurate representations of items as vectors in a high-dimensional space.
Use this model selection framework to choose the most appropriate model while balancing your performance requirements with cost, risks and deployment needs.
Register for the ebook on AI data stores
Vectors are a subset of tensors, which in machine learning (ML) is a generic term for a group of numbers—or a grouping of groups of numbers—in n-dimensional space. Tensors function as a mathematical bookkeeping device for data. Working up from the smallest element:
Vector numbers can represent complex objects such as words, images, videos and audio generated by an ML model. This high-dimensional vector data, containing multiple features, is essential to machine learning, natural language processing (NLP) and other AI tasks. Some example uses of vector data include:
Vector embeddings are numerical representations of data points that convert various types of data—including nonmathematical data such as words, audio or images—into arrays of numbers that ML models can process.
Artificial intelligence (AI) models, from simple linear regression algorithms to the intricate neural networks used in deep learning, operate through mathematical logic.
Any data that an AI model uses, including unstructured data, needs to be recorded numerically. Vector embedding is a way to convert an unstructured data point into an array of numbers that expresses that data’s original meaning.
Here's a simplified example of word embeddings for a very small corpus (2 words), where each word is represented as a 3-dimensional vector:
In this example, each word ("cat") is associated with a unique vector ([0.2, -0.4, 0.7]). The values in the vector represent the word's position in a continuous 3-dimensional vector space.
Words with similar meanings or contexts are expected to have similar vector representations. For instance, the vectors for "cat" and "dog" are close together, reflecting their semantic relationship.
Embedding models are trained to convert data points into vectors. Vector databases store and index the outputs of these embedding models. Within the database, vectors can be grouped together or identified as opposites based on semantic meaning or features across virtually any data type.
Vector embeddings are the backbone of recommendations, chatbots and generative apps such as ChatGPT.
For example, take the words “car” and “vehicle.” They have similar meanings but are spelled differently. For an AI application to enable effective semantic search, the vector representations of “car” and “vehicle” must capture their semantic similarity. In machine learning, embeddings represent high-dimensional vectors that encode this semantic information.
Vector databases serve three key functions in AI and ML applications:
In operation, vector databases work by using multiple algorithms to conduct an approximate nearest neighbor (ANN) search. The algorithms are then gathered in a pipeline to quickly and accurately retrieve and deliver data neighboring the vector that is queried.
For example, an ANN search could look for products that are visually similar in an e-commerce catalog. Additional uses include anomaly detection, classification and semantic search. Because a dataset runs through a model just once, results are returned within milliseconds.
Vector databases store the outputs of an embedding model algorithm, the vector embeddings. They also store each vector’s metadata—including title, description and data type—which can be queried by using metadata filters.
By ingesting and storing these embeddings, the database can facilitate fast retrieval of a similarity search, matching the user’s prompt with a similar vector embedding.
Vectors need to be indexed to accelerate searches within high-dimensional data spaces. Vector databases create indexes on vector embeddings for search functions.
The vector database indexes vectors by using an ML algorithm. Indexing maps the vectors to new data structures that enable faster similarity or distance searches, such as nearest neighbor searches, between vectors.
Vectors can be indexed by using algorithms such as hierarchical navigable small world (HNSW), locality-sensitive hashing (LSH) or product quantization (PQ).
Query vectors are vector representations of search queries. When a user queries or prompts an AI model, the model computes an embedding of the query or prompt. The database then calculates distances between query vectors and vectors stored in the index to return similar results.
Databases can measure the distance between vectors with various algorithms, such as nearest neighbor search. Measurements can also be based on various similarity metrics, such as cosine similarity.
The database returns the most similar vectors or nearest neighbors to the query vector according to the similarity ranking. These calculations support various machine learning tasks, such as recommendation systems, semantic search, image recognition and other natural language processing tasks.
Vector databases are a popular way to power enterprise AI-based applications because they can deliver many benefits:
Vector databases use various indexing techniques to enable faster searching. Vector indexing and distance-calculating algorithms such as nearest neighbor search can help optimize performance when searching for relevant results across large datasets with millions, if not billions, of data points.
One consideration is that vector databases provide approximate results. Applications requiring greater accuracy might need to use a different kind of database at the cost of a slower processing speed.
Vector databases can store and manage massive amounts of unstructured data by scaling horizontally with additional nodes, maintaining performance as query demands and data volumes increase.
Because they enable faster data retrieval, vector databases speed the training of foundation models.
Vector databases typically provide built-in features to easily update and insert new unstructured data.
Vector databases are built to handle the added complexity of using images, videos or other multidimensional data.
Given the multiple use cases ranging from semantic search to conversational AI applications, vector databases can be customized to meet business and AI requirements. Organizations can start with a general-purpose model such as IBM® Granite™ series models, Meta's Llama-2 or Google's Flan models, and then provide their own data in a vector database to enhance the output of the models and AI applications.
Organizations have a breadth of options when choosing a vector database capability. To find one that meets their data and AI needs, many organizations consider:
There are a few alternatives to choose from.
Vector databases should not be considered as stand-alone capabilities, but rather a part of a broader data and AI ecosystem.
Many offer APIs, native extensions or can be integrated with databases. Because vector databases are built to use enterprise data to enhance models, organizations must also have proper data governance and security in place to help ensure that the data used to train large language models (LLMs) can be trusted.
Beyond APIs, many vector databases use programming-language-specific software development kits (SDKs) that can wrap around the APIs. Using the SDKs, developers often find it easier to work with the data in their apps.
Using a vector store and index is well suited for applications that are based on facts or fact-based querying, such as extracting specific information from complex documents.
However, asking for a summary of topics would not work well with a vector index. In this case, an LLM would go through all the different possible contexts on that topic within the data.
A faster option would be to use a different kind of index, such as a list index rather than a vector index, because a list index would immediately fetch the first element in each listing.
To optimize vector database development, LangChain is an open-source orchestration framework for developing applications that use LLMs.
Available in both Python-based and JavaScript-based libraries, LangChain’s tools and APIs simplify the process of building LLM-driven apps such as chatbots and virtual agents. LangChain provides integrations for over 25 different embedding methods, and for over 50 different vector stores (both cloud-hosted and local).
To power enterprise-grade AI, a data lakehouse might be paired with an integrated vector database. Organizations can unify, curate and prepare vectorized embeddings for their generative AI applications at scale across their trusted, governed data. This enhances the relevance and precision of their AI workloads, including chatbots, personalized recommendation systems and image similarity search applications.
The applications for vector databases are vast and growing. Some key use cases include:
Retrieval-augmented generation (RAG) is an AI framework for enabling large language models (LLMs) to retrieve facts from an external knowledge base. Vector databases are key to supporting RAG implementations.
Enterprises are increasingly favoring RAG in generative AI workflows for its faster time-to-market, efficient inference and reliable output. The framework is particularly helpful in use cases such as customer care, HR and talent management.
RAG helps ensure that a model is linked to the most current, reliable facts and that users have access to the model’s sources so that its claims can be verified. Anchoring the LLM in trusted data can help reduce model hallucinations.
RAG uses high-dimensional vector data to enrich prompts with semantically relevant information for in-context learning by foundation models. RAG requires effective storage and retrieval during the inference stage, which handles the highest volume of data.
Vector databases excel at efficiently indexing, storing and retrieving these high-dimensional vectors, providing the speed, precision and scale needed for applications such as recommendation engines and chatbots.
Vector databases, particularly when used to implement RAG frameworks, can help improve virtual agent interactions by enhancing the agent’s ability to parse relevant knowledge bases efficiently and accurately. Agents can provide real-time contextual answers to user queries, along with the source documents and page numbers for reference.
E-commerce sites, for instance, can use vectors to represent customer preferences and product attributes. This enables them to suggest items similar to past purchases, based on vector similarity, enhancing user experience and increasing retention.
This search technique is used to discover similar items or data points, typically represented as vectors, in large collections. Vector search can capture the semantic relationships between elements, enabling effective processing by machine learning models and artificial intelligence applications.
These searches can take several forms.
IBM watsonx is an AI and data platform that’s built for business. Readily build custom AI applications, manage all data sources and accelerate responsible AI workflows—all on one platform.
IBM Cloud Databases for Elasticsearch combines the flexibility of a full-text search engine with the indexing power of a JSON document database. By combining integrated machine learning (ML) models with specialized ML nodes, data types and search algorithms, IBM Cloud Databases for Elasticsearch is ready to power your enterprise.
IBM Cloud Databases for PostgreSQL, a PostgreSQL database as a service offering, lets teams spend more time building with high availability, backup orchestration, point-in-time-recovery (PITR) and read replica with ease.
Organizations that use generative AI models correctly can see a myriad of benefits—from increased operational efficiency and improved decision-making to the rapid creation of marketing content.
Use this guide to understand what IBM AI assistants do best and for whom, how to compare them to others, and how to get started.
RAG is an AI framework for retrieving facts from an external knowledge base to ground LLMs on the most accurate, up-to-date information and to give users insight into LLMs' generative process.
All links reside outside ibm.com.
1 Gartner Innovation Insight: Vector Databases, Gartner, 4 September 2023.
2 2024 Strategic Roadmap for Storage, Gartner, 27 May 2024.