What to consider when using BERT and GPT models for NLP

Last time on the NLP blog series, we explored how BERT and GPT models change the game for NLP. BERT and GPT models have a lot of exciting potential applications, such as natural language generation (NLG) (useful for automating communication, report writing, summarizations), conversational assistant, question and answer platforms, and query understanding. However, there are several key considerations to investigate before embarking on a new model for your business use case.

Bias: As with all machine learning, it is important to understand any implicit bias in the training data. Applications using massive language models such as NLG are particularly prone to disastrous negative effects in bias when not properly evaluated. For instance, there are incidents of various applications generating text that is offensive or negatively stereotyped against the subject of the text. Given the especially massive training data required, it’s very important to be cognizant of the potential of bias in these models and to keep a human in the loop when refining these models to eliminate bias.

Explainability/Transparency: It is also important to understand the algorithmic workings of any model you use: transparency on how results are derived and actual explanations are critical to ensure you have a model you can trust. Increasingly, AI providers such as IBM are moving toward creating standards of fairness, explainability and transparency in the models they provide.

Computational costs: As mentioned, GPT-3 has been trained on over 100 billion parameters. Building applications with this model is an incredibly computationally intensive task. Other massive deep learning models are less computationally intensive than GPT-3, but still often require significant computation power to provide results quickly in real life settings. Often GPUs, which are significantly more expensive than conventional CPU processing, are used to increase speed of computation in these applications. As businesses consider applications of massive deep language models, they will also have to consider the cost-to-performance benefit of these models.

Data: Another consideration is training data. Businesses have to consider how much data they have (or can invest in acquiring) to meet the demands of training these models. With these models, requiring less training data often means that the underlying model is very large (such as GPT-3), which introduces the trade-off of computational costs vs. data.

Accuracy & Evaluation: For applications with established evaluation metrics (such as question answering, or traditional text analytic tasks like classification and sentiment), it’s important to pick the model that meets your needed level of accuracy, while considering the other tradeoffs discussed in data and computational costs. For applications with less established means of evaluating accuracy (such as NLG for summarization and conversation), it’s critical to choose or develop an evaluation scheme suitable to your use case before adopting these models for business use. NLG is often evaluated by human annotators, though there has been incremental progress in developing automated evaluation tools. Here, the scalability and reliability of the evaluation tool are also additional considerations. For instance, evaluating if an AI-generated report is coherent, exhaustive, and well-written enough for use in a business setting will require much deeper analysis and higher standards than evaluating its ability to compose Shakespearean sonnets.

When kicking off your pilot project, focus on creating a model that is trained to perform certain tasks with specific validation data. Train the model to perform the specific task you are trying to achieve.

Afterward, using the considerations listed in this blog, you can conduct testing to measure baseline performance. With the data science elements listed in this blog, you can assemble a checklist to determine which models will best help you launch your pilot. These elements of data science will all play a critical role in the machine learning pipeline in your pilot and thereafter.

Get started with IBM Watson NLP.

Was this article helpful?

YesNo

Demi Ajayi

Lead Product Manager, IBM Watson Natural Language Understanding

More from Artificial intelligence

Optimize your call center operations with new IBM watsonx assistants features

IBM, with flagship Granite models, named a strong performer in The Forrester Wave™: AI Foundation Models for Language, Q2 2024

Scale enterprise gen AI for code generation with IBM Granite code models, available as NVIDIA NIM inference microservices

IBM Newsletters