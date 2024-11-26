Masked language models (MLM) are a type of large language model (LLM) used to help predict missing words from text in natural language processing (NLP) tasks. By extension, masked language modeling is one form of training transformer models—notably bidirectional encoder representations from transformers (BERT) and its derivative robustly optimized BERT pretraining approach (RoBERTa)—for NLP tasks by training the model to fill in masked words within a text, and thereby predict the most likely and coherent words to complete the text.1

Masked language modeling aids many tasks—from sentiment analysis to text generation—by training a model to understand the contextual relationship between words. In fact, research developers often use masked language modeling to create pretrained models that undergo further supervised fine-tuning for downstream tasks, such as text classification or machine translation. Masked language models thereby undergird many current state-of-the-art language modeling algorithms. Although masked language modeling is a method for pretraining language models, online sources sometimes refer to it as a transfer learning method. This might not be unjustified as some research groups have begun to implement masked language modeling as an end-task in itself.

The HuggingFace transformers and Tensorflow text libraries contain functions designed to train and test masked language models in Python, both as end-tasks and for downstream tasks.