Cosine similarity is a widely used similarity metric that determines how similar two data points are based on the direction they point rather than their length or size. It is especially effective in high-dimensional spaces where traditional distance-based metrics can struggle.

Computing cosine similarity requires measuring the cosine of the angle (theta) between two non-zero vectors in an inner product space. This measurement produces a cosine similarity score. Cosine similarity values range from -1 to 1:

A cosine similarity score of 1 indicates that the vectors are pointing in the exact same direction.

A cosine similarity score of 0 indicates that the vectors are orthogonal, meaning they have no directional similarity.

A cosine similarity score of -1 indicates that the vectors point in exactly opposite directions.

Think of it like comparing arrows: if they’re pointing in the same direction, they are highly similar. Those at right angles are unrelated, and arrows pointing in opposite directions are dissimilar.

This angular approach is foundational to many machine learning (ML), natural language processing (NLP) and artificial intelligence (AI) systems. These technologies rely on vector-based representations of data, meaning the data has been converted into a numerical form to capture its meaning and similarity to other data.

For instance, a chatbot may use word embedding techniques to convert text into vector form, deep learning models to understand intent and similarity search algorithms to retrieve the most relevant response from a database. Cosine similarity enables each of these steps.