Unlike traditional relational databases with rows and columns, data points in a vector database are represented by vectors with a fixed number of dimensions. Because they use high-dimensional vector embeddings, vector databases are better able to handle unstructured datasets.
The nature of data has undergone a profound transformation. It's no longer confined to structured information easily stored in traditional databases. Unstructured data—including social media posts, images, videos, audio clips and more—is growing 30% to 60% year over year.2
Relational databases excel at managing structured and semistructured datasets in specific formats. Loading unstructured data sources into a traditional relational database to store, manage and prepare the data for artificial intelligence (AI) is a labor-intensive process, especially with new generative use cases such as similarity search.
Traditional search typically represents data by using discrete tokens or features, such as keywords, tags or metadata. Traditional searches rely on exact matches to retrieve relevant results. For example, a search for "smartphone" would return results containing the word "smartphone."
Opposed to this, vector search represents data as dense vectors, which are vectors with most or all elements being nonzero. Vectors are represented in a continuous vector space, the mathematical space in which data is represented as vectors.
Vector representations enable similarity search. For example, a vector search for “smartphone” might also return results for “cellphone” and “mobile devices.”
Each dimension of the dense vector corresponds to a latent feature or aspect of the data. A latent feature is an underlying characteristic or attribute that is not directly observed but inferred from the data through mathematical models or algorithms.
Latent features capture the hidden patterns and relationships in the data, enabling more meaningful and accurate representations of items as vectors in a high-dimensional space.