December 7, 2020 By Demi Ajayi 3 min read

Last time on the NLP blog series, we explored how BERT and GPT models change the game for NLP. BERT and GPT models have a lot of exciting potential applications, such as natural language generation (NLG) (useful for automating communication, report writing, summarizations), conversational assistant, question and answer platforms, and query understanding. However, there are several key considerations to investigate before embarking on a new model for your business use case.

Bias: As with all machine learning, it is important to understand any implicit bias in the training data. Applications using massive language models such as NLG are particularly prone to disastrous negative effects in bias when not properly evaluated. For instance, there are incidents of various applications generating text that is offensive or negatively stereotyped against the subject of the text. Given the especially massive training data required, it’s very important to be cognizant of the potential of bias in these models and to keep a human in the loop when refining these models to eliminate bias.

Explainability/Transparency: It is also important to understand the algorithmic workings of any model you use: transparency on how results are derived and actual explanations are critical to ensure you have a model you can trust. Increasingly, AI providers such as IBM are moving toward creating standards of fairness, explainability and transparency in the models they provide.

Computational costs: As mentioned, GPT-3 has been trained on over 100 billion parameters. Building applications with this model is an incredibly computationally intensive task. Other massive deep learning models are less computationally intensive than GPT-3, but still often require significant computation power to provide results quickly in real life settings. Often GPUs, which are significantly more expensive than conventional CPU processing, are used to increase speed of computation in these applications. As businesses consider applications of massive deep language models, they will also have to consider the cost-to-performance benefit of these models.

Data: Another consideration is training data. Businesses have to consider how much data they have (or can invest in acquiring) to meet the demands of training these models. With these models, requiring less training data often means that the underlying model is very large (such as GPT-3), which introduces the trade-off of computational costs vs. data.

Accuracy & Evaluation: For applications with established evaluation metrics (such as question answering, or traditional text analytic tasks like classification and sentiment), it’s important to pick the model that meets your needed level of accuracy, while considering the other tradeoffs discussed in data and computational costs. For applications with less established means of evaluating accuracy (such as NLG for summarization and conversation), it’s critical to choose or develop an evaluation scheme suitable to your use case before adopting these models for business use. NLG is often evaluated by human annotators, though there has been incremental progress in developing automated evaluation tools. Here, the scalability and reliability of the evaluation tool are also additional considerations. For instance, evaluating if an AI-generated report is coherent, exhaustive, and well-written enough for use in a business setting will require much deeper analysis and higher standards than evaluating its ability to compose Shakespearean sonnets.

When kicking off your pilot project, focus on creating a model that is trained to perform certain tasks with specific validation data. Train the model to perform the specific task you are trying to achieve.

Afterward, using the considerations listed in this blog, you can conduct testing to measure baseline performance. With the data science elements listed in this blog, you can assemble a checklist to determine which models will best help you launch your pilot. These elements of data science will all play a critical role in the machine learning pipeline in your pilot and thereafter.

Get started with IBM Watson NLP.

Was this article helpful?
YesNo

More from Artificial intelligence

How IBM is shaping AI governance in education with Smarter Balanced

6 min read - The California-based Smarter Balanced Assessment Consortium is a member-led public organization that provides assessment systems to educators working in K-12 and higher education. The organization, which was founded in 2010, partners with state education agencies to develop innovative, standards-aligned test assessment systems. Smarter Balanced supports educators with tools, lessons and resources including formative, interim and summative assessments, which help educators to identify learning opportunities and strengthen student learning. Smarter Balanced is committed to evolution and innovation in an ever-changing educational…

Tools for trustworthy AI

5 min read - A new tool has been developed to catch students cheating with ChatGPT. It’s 99.9% effective. But OpenAI hasn’t released it because it’s mired in ethics concerns. It’s just one example of one of the major challenges facing AI. How can we monitor the technology to make sure it’s used ethically? For the past few years, the biggest names in AI have pushed for their tech to be used responsibly. And using AI ethically isn’t just the right thing for businesses…

When AI chatbots break bad

3 min read - A new challenge has emerged in the rapidly evolving world of artificial intelligence. "AI whisperers" are probing the boundaries of AI ethics by convincing well-behaved chatbots to break their own rules. Known as prompt injections or "jailbreaks," these exploits expose vulnerabilities in AI systems and raise concerns about their security. Microsoft recently made waves with its "Skeleton Key" technique, a multi-step process designed to circumvent an AI's ethical guardrails. But this approach isn't as novel as it might seem. "Skeleton…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters