2020 has been a great year for the advancement of Natural Language Processing (NLP). We have seen a large number of organizations, from small ventures to massive enterprises, adopt NLP into their business solutions and data pipelines. With this high rate of adoption comes the next set of innovations and challenges.
As we close out 2020, there is much to look forward to in the upcoming year within the NLP space. Two key areas will stand out in terms of value, usage, and ROI: natural language generation (NLG) and an improved customization experience.
Natural Language Generation:
NLG (which we explored earlier) will play an increasing role in uncovering new insights. We are seeing clients’ existing data pipelines start to integrate summarization. Extractive summarization will allow businesses to pull the most important sentences from a document and construct a summary that allows the reader to grasp the general idea quickly. This likely could be used in review analysis and on things such as customer reviews, press releases or news articles. Watson Natural Language Understanding recently released an experimental extractive summarization feature, documented here. Our new extractive summarization technology incorporates capabilities from IBM Research’s Project Debater.
While there are clear benefits to using extractive summarization in the immediate future, we are sometime away from building and implementing abstractive summarization. Abstractive summarization infers the summary from a document. For example, if we had the following sentences:
My mother went to the grocery store. She bought an apple from the store. The apple was ripe.
The abstractive summary would be:
My mother bought a ripe apple from the store.
We will certainly be seeing more usage of extractive summarization initially, but customers will ultimately lean more towards abstractive summarization as it becomes available, depending on the use case.
Over the next year, there will be an increasing focus on the customization experience within the NLP space. Businesses globally are continuously integrating NLP into their business pipeline. However, each business works in a slightly different part of NLP, where vernacular and context in one industry means something completely different in another.
To ensure that they are achieving the highest accuracy within their domain, businesses will have to embed a layer of customization for sentiment analysis and text classification. This requires training a machine learning model to increase accuracy.
But both of these features require work on the customer end to collect, label and upload training data. Many NLP solutions today require a large number of examples per label for the machine learning model. In addition, the value of the custom model may not be fully recognized even after collecting and building the training file. And if the training file was not built correctly or there was an error in labeling the data, the results will be less accurate.
Therefore, it is extremely important to provide a customization experience for customers that allows them to focus more on improving the machine learning model rather than spending countless hours collecting and labeling data. To help you prioritize your time, Watson NLP learns more from less data. In 2020, IBM released a new, more accurate natural language understanding (NLU) model in IBM Watson Assistant for intent classification, as well as new NLP advancements within IBM Watson Assistant and Watson Discovery.
While the COVID-19 pandemic set many businesses back, the NLP space remains a strong backbone of AI. 2020 has brought about creative and innovative use cases for NLP, including our partnerships with the Weather Channel, That’s Debatable, the US Open and ESPN Fantasy Football. We expect to see similar, if not higher, growth in the coming year.