December 7, 2020 By Demi Ajayi 3 min read

Last time on the NLP blog series, we explored how BERT and GPT models change the game for NLP. BERT and GPT models have a lot of exciting potential applications, such as natural language generation (NLG) (useful for automating communication, report writing, summarizations), conversational assistant, question and answer platforms, and query understanding. However, there are several key considerations to investigate before embarking on a new model for your business use case.

Bias: As with all machine learning, it is important to understand any implicit bias in the training data. Applications using massive language models such as NLG are particularly prone to disastrous negative effects in bias when not properly evaluated. For instance, there are incidents of various applications generating text that is offensive or negatively stereotyped against the subject of the text. Given the especially massive training data required, it’s very important to be cognizant of the potential of bias in these models and to keep a human in the loop when refining these models to eliminate bias.

Explainability/Transparency: It is also important to understand the algorithmic workings of any model you use: transparency on how results are derived and actual explanations are critical to ensure you have a model you can trust. Increasingly, AI providers such as IBM are moving toward creating standards of fairness, explainability and transparency in the models they provide.

Computational costs: As mentioned, GPT-3 has been trained on over 100 billion parameters. Building applications with this model is an incredibly computationally intensive task. Other massive deep learning models are less computationally intensive than GPT-3, but still often require significant computation power to provide results quickly in real life settings. Often GPUs, which are significantly more expensive than conventional CPU processing, are used to increase speed of computation in these applications. As businesses consider applications of massive deep language models, they will also have to consider the cost-to-performance benefit of these models.

Data: Another consideration is training data. Businesses have to consider how much data they have (or can invest in acquiring) to meet the demands of training these models. With these models, requiring less training data often means that the underlying model is very large (such as GPT-3), which introduces the trade-off of computational costs vs. data.

Accuracy & Evaluation: For applications with established evaluation metrics (such as question answering, or traditional text analytic tasks like classification and sentiment), it’s important to pick the model that meets your needed level of accuracy, while considering the other tradeoffs discussed in data and computational costs. For applications with less established means of evaluating accuracy (such as NLG for summarization and conversation), it’s critical to choose or develop an evaluation scheme suitable to your use case before adopting these models for business use. NLG is often evaluated by human annotators, though there has been incremental progress in developing automated evaluation tools. Here, the scalability and reliability of the evaluation tool are also additional considerations. For instance, evaluating if an AI-generated report is coherent, exhaustive, and well-written enough for use in a business setting will require much deeper analysis and higher standards than evaluating its ability to compose Shakespearean sonnets.

When kicking off your pilot project, focus on creating a model that is trained to perform certain tasks with specific validation data. Train the model to perform the specific task you are trying to achieve.

Afterward, using the considerations listed in this blog, you can conduct testing to measure baseline performance. With the data science elements listed in this blog, you can assemble a checklist to determine which models will best help you launch your pilot. These elements of data science will all play a critical role in the machine learning pipeline in your pilot and thereafter.

Get started with IBM Watson NLP.

Was this article helpful?
YesNo

More from Artificial intelligence

Optimize your call center operations with new IBM watsonx assistants features

5 min read - Everyone has had at least one bad experience when dialing into a call center. The robotic audio recording, the limited menu options, the repetitive elevator music in the background, and the general feeling of time wasted are all too familiar. As customers try to get answers, many times they find themselves falling into the infamous spiral of misery, searching desperately to speak to a live agent. While virtual assistants, mobile applications and digital web interfaces have made self-service options in…

IBM, with flagship Granite models, named a strong performer in The Forrester Wave™: AI Foundation Models for Language, Q2 2024

6 min read - As enterprises move from generative artificial intelligence (gen AI) experimentation to production, they are looking for the right choices when it comes to foundation models with an optimal mix of attributes that yield trusted, performant and cost-effective gen AI. Businesses recognize that they cannot scale gen AI with foundation models they cannot trust. We are pleased to announce that IBM, with its flagship Granite family of models, has been named a strong performer in the Forrester Wave™: AI Foundation Models…

Scale enterprise gen AI for code generation with IBM Granite code models, available as NVIDIA NIM inference microservices

3 min read - Many enterprises today are moving from generative AI (gen AI) experimentation to production, deployment and scaling. Code generation and modernization are now among the top enterprise use cases that offer a clear path to value creation, cost reduction and return on investment (ROI). IBM® Granite™ is a family of enterprise-grade models developed by IBM Research® with rigorous data governance and regulatory compliance. Granite currently supports multilingual language and code modalities. And as of the NVIDIA AI Summit in Taiwan this…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters