Natural language processing (NLP) refers to the branch of computer science—and more specifically, the branch of artificial intelligence or AI—concerned with giving computers the ability to understand text and spoken words in much the same way human beings can.
NLP combines computational linguistics—rule-based modeling of human language—with statistical, machine learning, and deep learning models. Together, these technologies enable computers to process human language in the form of text or voice data and to ‘understand’ its full meaning, complete with the speaker or writer’s intent and sentiment.
NLP drives computer programs that translate text from one language to another, respond to spoken commands, and summarize large volumes of text rapidly—even in real time. There’s a good chance you’ve interacted with NLP in the form of voice-operated GPS systems, digital assistants, speech-to-text dictation software, customer service chatbots, and other consumer conveniences. But NLP also plays a growing role in enterprise solutions that help streamline business operations, increase employee productivity, and simplify mission-critical business processes.
Human language is filled with ambiguities that make it incredibly difficult to write software that accurately determines the intended meaning of text or voice data. Homonyms, homophones, sarcasm, idioms, metaphors, grammar and usage exceptions, variations in sentence structure—these just a few of the irregularities of human language that take humans years to learn, but that programmers must teach natural language-driven applications to recognize and understand accurately from the start, if those applications are going to be useful.
Several NLP tasks break down human text and voice data in ways that help the computer make sense of what it's ingesting. Some of these tasks include the following:
See the blog post “NLP vs. NLU vs. NLG: the differences between three natural language processing concepts” for a deeper look into how these concepts relate.
The all new enterprise studio that brings together traditional machine learning along with new generative AI capabilities powered by foundation models
The Python programing language provides a wide range of tools and libraries for attacking specific NLP tasks. Many of these are found in the Natural Language Toolkit, or NLTK, an open source collection of libraries, programs, and education resources for building NLP programs.
The NLTK includes libraries for many of the NLP tasks listed above, plus libraries for subtasks, such as sentence parsing, word segmentation, stemming and lemmatization (methods of trimming words down to their roots), and tokenization (for breaking phrases, sentences, paragraphs and passages into tokens that help the computer better understand the text). It also includes libraries for implementing capabilities such as semantic reasoning, the ability to reach logical conclusions based on facts extracted from text.
The earliest NLP applications were hand-coded, rules-based systems that could perform certain NLP tasks, but couldn't easily scale to accommodate a seemingly endless stream of exceptions or the increasing volumes of text and voice data.
Enter statistical NLP, which combines computer algorithms with machine learning and deep learning models to automatically extract, classify, and label elements of text and voice data and then assign a statistical likelihood to each possible meaning of those elements. Today, deep learning models and learning techniques based on convolutional neural networks (CNNs) and recurrent neural networks (RNNs) enable NLP systems that 'learn' as they work and extract ever more accurate meaning from huge volumes of raw, unstructured, and unlabeled text and voice data sets.
For a deeper dive into the nuances between these technologies and their learning approaches, see “AI vs. Machine Learning vs. Deep Learning vs. Neural Networks: What’s the Difference?”
Natural language processing is the driving force behind machine intelligence in many modern real-world applications. Here are a few examples:
Accelerate the business value of artificial intelligence with a powerful and flexible portfolio of libraries, services and applications.
Infuse powerful natural language AI into commercial applications with a containerized library designed to empower IBM partners with greater flexibility.
Learn about different NLP use cases in this NLP explainer.
Visit the IBM Developer's website to access blogs, articles, newsletters and more. Become an IBM partner and infuse IBM Watson embeddable AI in your commercial solutions today. BM Watson NLP Library for Embed into your solutions.
IBM Digital Self-Serve Co-Create Experience (DSCE) helps data scientists, application developers and ML-Ops engineers discover and try IBM's embeddable AI portfolio across IBM Watson Libraries, IBM Watson APIs and IBM AI Applications.
Watch IBM Data & AI GM, Rob Thomas as he hosts NLP experts and clients, showcasing how NLP technologies are optimizing businesses across industries.
Ethical considerations for AI have never been more critical than they are today.
IBM has launched a new open-source toolkit, PrimeQA, to spur progress in multilingual question-answering systems to make it easier for anyone to quickly find information on the web.