December 3, 2020 By Demi Ajayi 4 min read

AI that can compose Shakespearean sonnets. AI that can design a webpage based on a simple user description. AI that can summarize a description of quantum computing for an eighth grader. Since the launch of the GPT-3 language model this year, natural language processing (NLP) and machine learning enthusiast communities have been abuzz with stories of the new purported capabilities with language-based AI.

Recent advancements with NLP have been a few years in the making, starting in 2018 with the launch of two massive deep learning models: GPT (Generative Pre-Training) by Open AI, and BERT (Bidirectional Encoder Representations from Transformers) for language understanding, including BERT-Base and BERT-Large by Google. Unlike previous NLP models, BERT is an open source and deeply bidirectional and unsupervised language representation, which is pretrained solely using a plain text corpus. Since then we have seen the development of other deep learning massive language models: GPT-2, RoBERT, ESIM+GloVe and now GPT-3, the model that launched a thousand tech articles.

Today’s NLP series blog discusses the BERT and GPT models: what makes these models so powerful and how they can benefit your business.

How massive deep learning models work

Language models estimate the probability of words appearing in a sentence, or of the sentence itself existing. As such, they are useful building blocks in a lot of NLP applications. But they often require a burdensome amount of training data to be useful for specific tasks and domains.

Massive deep learning language models are designed to tackle these pervasive training data issues. They are pretrained using an enormous amount of unannotated data to provide a general-purpose deep learning model. By fine-tuning these pretrained models, downstream users can create task-specific models with smaller annotated training datasets (a technique called transfer learning). These models represent a breakthrough in NLP: now state-of-the-art results can be achieved with smaller training datasets.

Until recently, the state of the art for NLP language models were RNN models. These are useful for sequenced tasks such as abstractive summarization, machine translation and general natural language generation. RNN models process words sequentially, in the order they appear in context, one word at a time. As a result, these models are hard to parallelize and poor at retaining contextual relationships across long text inputs. As we’ve discussed in a previous post, in NLP context is key.

The Transformer, a model introduced in 2017, bypasses these issues. Transformers (such as BERT and GPT) use an attention mechanism, which “pays attention” to the words most useful in predicting the next word in a sentence. With these attention mechanisms, Transformers process an input sequence of words all at once, and they map relevant dependencies between words regardless of how far apart the words appear in the text. As a result, Transformers are highly parallelizable, can train much larger models at a faster rate, and use contextual clues to solve a lot of ambiguity issues that plague text.

Individual Transformers also have their own unique advantages. Until this year, BERT was the most popular deep learning NLP model, achieving state-of-the-art results across many NLP tasks.

Trained on 2.5 billion words, its main advantage is its use of bi-directional learning to gain context of words from both left to right context and right to left context simultaneously, BERT’s bidirectional training approach is optimized for predicting masked words (Masked LM) and outperforms left-to-right training after a small number of pre-training steps. During the model training process, Next Sentence Prediction (NSP) training enables the model to understand how sentences relate to each other, if sentence B should precede or follow sentence A. As a result, it’s able to derive more context. For example, it can understand the semantic meanings of bank in the following sentences: “Raise your oars when you get to the river bank” and “The bank is sending a new debit card.” To understand this, it uses left-to-right river and right-to-left debit card clues.

Unlike BERT models, GPT models are unidirectional. The major advantage of GPT models is the sheer volume of data they were pretrained on: GPT-3, the third-generation GPT model, was trained on 175 billion parameters, about 10 times the size of previous models. This truly massive pretrained model means that users can fine-tune NLP tasks with very little data to accomplish novel tasks. While Transformers in general have reduced the amount of data needed to train models, GPT-3 has the distinct advantage over BERT in that it requires much less data to train models.

For instance, with as few as 10 sentences the model has been taught to write an essay on why humans should not be afraid of AI. (Though, it should be noted, the variable quality of these freeform essays show the limitations of the technology today.)

Tasks executed with BERT and GPT models:

  • Natural language inference is a task performed with NLP that enables models to determine whether a statement is true, false or undetermined based on a premise. For example, if the premise is “tomatoes are sweet” and the statement is “tomatoes are fruit” it might be labelled as undetermined.
  • Question answering enables developers and organizations to create and code question answering systems based on neural networks. In question-answering tasks, the model receives a question regarding text content and returns the answer in text, specifically marking the beginning and end of each answer.
  • Text classification is used for sentiment analysis, spam filtering, news categorization. Use BERT to fine-tune detection of content categories, across any text-classification use case.

The future of massive deep learning models is quite exciting. Research in this area is advancing by leaps and bounds. We expect to see increased progress in the technology and major considerations raised here in the coming months and years. Here at IBM Watson, we will continue to develop, evaluate and incorporate the best of the technology suitable for business cases. Next in the NLP blog series, we’ll explore several key considerations to investigate before embarking on a new model for your business use case.

Learn more about Watson NLP.

Was this article helpful?

More from Artificial intelligence

IBM watsonx Challenge empowers partners to solve real-world problems with AI

2 min read - In June, IBM invited ecosystem partners in Europe, the Middle East and Africa to participate in an IBM watsonx™ Challenge, a hands-on experience designed to bring the watsonx platform capabilities to some of the most important members of the IBM ecosystem. These ecosystem partners, who sell, build or service IBM technologies, enthusiastically embraced the challenge. Participants formed teams and focused on quickly crafting a solution to one of three selected challenges.   The challenges included using prompt engineering to analyze…

10 tasks I wish AI could perform for financial planning and analysis professionals

4 min read - It’s no secret that artificial intelligence (AI) transforms the way we work in financial planning and analysis (FP&A). It is already happening to a degree, but we could easily dream of many more things that AI could do for us. Most FP&A professionals are consumed with manual work that detracts from their ability to add value to their work. This often leaves chief financial officers and business leaders frustrated with the return on investment from their FP&A team. However, AI…

ServiceNow and IBM revolutionize talent development with AI

4 min read - Generative AI is fundamentally changing the world of work by redefining the skills and jobs needed for the future. In fact, recent research from ServiceNow and Pearson found that an additional 1.76 million tech workers will be needed by 2028 in the US alone.  However, according to the IBM Institute for Business Value, less than half of CEOs surveyed (44%) have assessed the potential impact of generative AI on their workforces. To help customers develop and upskill their workforces to meet…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters