Most machine learning algorithms can only take low-dimensional numerical data as inputs. Therefore, it is necessary to convert the data into a numerical format. This can involve things like creating a “bag of words” representation for text data, converting images into pixel values or transforming graph data into a numerical matrix.

Objects that come into an embedding model are output as embeddings, represented as vectors. A vector is an array of numbers (e.g. 1489, 22… 3, 777), where each number indicates where an object is along a specified dimension. The number of dimensions can reach a thousand or more depending on the input data’s complexity. The closer an embedding is to other embeddings in this n-dimensional space, the more similar they are. Distribution similarity is determined by the length of the vector points from one object to the other (measured by Euclidean, cosine or other).

One model, Word2Vec (word to vector), developed by Google in 2013, is a method to efficiently create word embeddings by using a two-layer neural network. It takes as input a word and spits out an n-dimensional coordinate (the embedding vector) so that when you plot these word vectors in a three-dimensional space, synonyms cluster.

Here is how two words, “dad” and “mom” would be represented as vectors:

“dad” = [0.1548, 0.4848, …, 1.864]

“mom” = [0.8785, 0.8974, …, 2.794]

Although there is some similarity between these two words, we would expect that “father” would live in much closer proximity to “dad” in the vector space, resulting in a higher dot product (a measure of the relative direction of two vectors and how closely they align in the direction they point).

A more complex example is recommendation embedding, which works by representing users and items (e.g., movies, products, articles) as high-dimensional vectors in a continuous vector space. These embeddings capture latent features that reflect users' preferences and item characteristics. The idea is to learn a representation for each user and item in such a way that the dot product of their embeddings correlates with the user's preference for that item.

Each user and item is associated with an embedding vector. These vectors are typically learned through a recommendation model during a training process. The user embeddings and item embeddings are organized into matrices. The rows of the user matrix represent users, and the rows of the item matrix represent items.

The recommendation score for a user-item pair can be computed by taking the dot product of the user's embedding vector and the item's embedding vector. The higher the dot product, the more likely the user is to be interested in the item.

Recommendation Score = User Embedding ⋅ Item Embedding

The embedding matrices are learned through a training process using historical user-item interactions. The model aims to minimize the difference between predicted scores and actual user preferences (e.g., ratings, clicks, purchases).

Once the model is trained, it can be used to generate top-N recommendations for users. The items with the highest predicted scores for a user are recommended.