A content-based filtering system filters recommendations based on an item’s features. Content-based recommender systems assume that if a user likes a particular item, they will also like another similar item. Content-based filtering considers item descriptions such as color, category, price and other metadata assigned by keywords and tags, along with explicit and implicit data.
Content-based filtering systems represent items and users as vectors in a vector space. Proximity is used to determine the similarity between items. The closer 2 vectors are in space, the more similar they’re considered to be. Vectors similar to previous items according to their supplied features will be recommended to the user.
Content-based recommenders apply a user-based classifier or regression model. Descriptions and features of items a user is interested in act as the model’s training data set, which then yields predictions for recommended items.
Content-based recommendation systems can be further improved by using natural language processing tags. However, this tagging process can be tedious for huge volumes of data.
Unlike collaborative filtering, the cold start problem is less of an issue since content-based filtering is based on metadata characteristics rather than past user interaction. However, content-based filtering can be limited in exploring new items, as it often suggests those similar to what users liked previously.