Text summarization condenses one or more texts into shorter summaries for enhanced information extraction.
Automatic text summarization (or document summarization) is a natural language processing (NLP) method that condenses information from one or more input text documents into an original output text. How much of the input text appears in the output is debated—some definitions state only 10%, others 50%.1 Text summarization algorithms often use deep learning architectures—specifically, transformers—to parse documents and generate text summaries.
There are two principal types of summarization: extractive and abstractive.
Extractive summarization extracts unmodified sentences from the original text documents. A key difference between extractive algorithms is how they score sentence importance while reducing topical redundancy. Differences in sentence scoring determines which sentences to extract and which to retain.
Abstractive summarization generates original summaries using sentences not found in the original text documents. Such generation requires neural networks and large language models (LLMs) to produce semantically meaningful text sequences.
As one may guess, abstractive text summarization is more computationally expensive then extractive, requiring a more specialized understanding of artificial intelligence and generative systems. Of course, extractive text summarization may also utilize neural networks transformers—such as GPT, BERT, and BART—to create summaries. Nevertheless, extractive approaches do not require neural networks.2
Comparative evaluations of extractive and abstractive techniques show mixed results. For instance, while some research suggests that abstractive summarization is more prone to hallucinations—that is, misleading or factually false information.3 Additional research, however, suggests that abstractive hallucinations actually align with world knowledge, being derived from the summarization source material itself.4 Other comparisons of extractive and abstractive techniques show that each have their comparative benefits. While human users view abstractive summaries as more coherent, they also consider extractive summaries more informative and relevant.5 Research also suggests that the controversiality of text subject matter affects how users view respective summary types.6 Thus, there may not be a direct one-to-one evaluative comparison between these summarization types.
As with other NLP tasks, text summarization requires text data first undergo preprocessing. This includes tokenization, stopword removal, and stemming or lemmatization in order to make the dataset readable by a machine learning model. After preprocessing, all extractive text summarization methods follow three general, independent steps: representation, sentence scoring, and sentence selection.
In the representation stage, an algorithm segments and represents preprocessed text data for comparison. Many of these representations build from bag of words models, which represent text segments—such as words or sentences—as datapoints in a vector space. Large, multi-document datasets my use term frequency-inverse document frequency (TF-IDF), a variant of bag of words that weights each term to reflect its importance within a text set. Topic modeling tools such as latent semantic analysis (LSA) are another representation method that produce groups of summary keywords weighted across documents. Other algorithms, such as LexRank and TextRank, use graphs. These graph-based approaches represent sentences as nodes (or vertices) that are connected by lines according to semantic similarity scores. How do algorithms gauge semantic similarity?7
Sentence scoring, per its name, scores each sentence in a text according to their importance to that text. Different representations implement different scoring methods. For example, topic representation approaches score each sentence according to the degree that they individually express or combine key topics. More specifically, this may involve weighting sentences according to the co-frequency of topic keywords. Graph-based approaches, compute sentence centrality. These algorithms determine centrality using TF-IDF to calculate how far a given sentence node may be from a document’s centroid in vector space.8
The final general step in extractive algorithms is sentence selection. Having weighted sentences by importance, algorithms select the n most important sentences for a document or collection thereof. These sentences comprise the generated summary. But what if there is semantic and thematic overlap in these sentences? The sentence selection step aims to reduce redundancy in the final summaries. Maximal marginal relevance methods employ an iterative approach. Specifically, they recompute sentence importance scores in accordance with that sentence’s similarity to already selected sentences. Global selection methods select a subset of the most importance sentences to maximize overall importance and reduce redundancy.9
As this overview illustrates, extractive text summarization is ultimately a text (and most often, sentence) ranking issue. Extractive text summarization techniques rank documents and their test strings (for sample, sentences) in order or produce a summary that best matches the central topics identified in the given texts. In this way, extractive summarization may be understood as a form of information retrieval.10
As covered, abstractive text summarization techniques employ neural networks to generate original text that summarizes one or more documents. While there are numerous types of abstractive text summarization methods, literature does not use any one overarching classification system for describing these methods.11 Nevertheless, it is possible to overview the general aims of these various methods.
As with many artificial intelligence applications, abstractive text summarization ultimately aims to mimic human-generated summaries. One key feature of the latter is sentence compression—humans summarize longer texts and sentences by shortening them. There are two general approaches to sentence compression: rule-based and statistical methods.
The former leverage syntactical knowledge to parse grammatical segments. These use keywords, syntactic clues, or even part-of-speech labels to extract text snippets that are then merged, often according to a pre-defined template. This template can be lifted from additional automated text analysis or user-defined rules.2
In statistical approaches, a model (whether obtained from pretraining or finetuning) learns which sentence segments to remove. For example, a tree parser may identify similar sentences from an input text and populate comparable sentences across a tree structure. A dependency tree is one such structure that models sentences according to the perceived relation between words, aligning with subject-predicate arrangements. A sentence in this structure may have the verb as its central node, with subjects and objects (that is, nouns) and conjunctions branching off. Additional verbs will then branch off from the nouns to which they are attached. Once text is represented in a tree structure, the algorithm then selects common words or phrases for use by a generative network in creating a new summary.12
As this brief overview of sentence compression hints, information fusion is another key aspect of abstractive summarization. People summarize documents by concatenating information from multiple passages into a single sentence or phrase.2 One proposed approach to mimic this is sentence fusion across a multi-document set. This approach identifies commonly occurring phrases across a set of documents and fuses them through a technique called lattice computation to produce a grammatically coherent English summary.13 Another proposed method uses neural topic models to generate key terms that in turn guide summary generation. In this approach, commonly occurring keywords covering main points across multiple documents are combined into a single sentence or group thereof.14
A final concern in abstractive text summarization is the order of information. Summarized information does not necessarily follow the same order as that of the initial source document. When people write summaries, for instance, they may often organize information thematically. One method used for thematic organization is clusters. Specifically, extracted sentences are organized in clusters according to topical content (as determined by co-occurring keywords). Along these lines, neural topic models are another potential approach ordered information topically.2
Developers use a number of evaluation metrics for text summarization. Differences in metrics generally depend on the type of summary as well as which feature of the summary one wants to measure.
BLEU (bilingual evaluation understudy) is an evaluation metric commonly used in machine translation. It measures similarity between ground truth and model output for a sequence of n words, known as n-grams. In text summarization, BLEU measures how often, and to what extent, n-grams in an automatic summary overlap with those in a human-generated summary, accounting for erroneous word repetitions in the former. It then uses these precision scores for individual n-grams to calculate an overall text precision, known as the geometric mean precision. This final value is between 0 and 1, the latter indicating perfect alignment between the machine and human generated text summaries.15
ROUGE (recall-oriented understudy for gisting evaluation) is derived from BLEU specifically for evaluating summarization tasks. Like BLEU, it compares machine summaries to human-generated summaries using n-grams. But while BLEU measures machine precision, ROUGE measures machine recall. In other words, ROUGE computes the accuracy of an automatic summary according to the number of n-grams from the human-generated summarization found in the automatic summary. The ROUGE score, like BLEU, is any value between 0 and 1, the latter indicating perfect alignment between the machine and human generated text summaries.16
Note that these metrics evaluate the final summarized text output. They are distinct from the myriad sentence scoring methods used within text summarization algorithms that select suitable sentences and keywords from which to produce the final summarized output.
A number of libraries allow users to readily implement text summarization tools in Python. For instance, the HuggingFace Transformers Library comes loaded with BART, an encoder-decoder transformer architecture, for generating text summaries. OneAI’s Language Skills API also provides tools for readily generating text summaries.
Text summarization’s most obvious application is expedited research. This has potential uses for a variety of fields, such as legal, academic, and marketing. Researchers also show how text summarization transformers can advance additional tasks, however.
News News articles are a common dataset for testing and comparing text summarization techniques. Summarization is not always the end goal, however. A handful of studies investigates the role of transformer-derived text summaries as a mode of feature extraction for powering fake news detection models.17 This research shows promising potential and illustrates how text summaries can be adopted for more wide-reaching uses than merely saving time in reading multiple texts.
Translation Cross-lingual summarization is a branch of text summarization that overlaps with machine translation. Admittedly, this is not as large a research field as summarization or translation themselves. Nevertheless, the aim of summarizing a source language text or text collection in a different target language poses an array of new challenges.18 One published explores cross-lingual summarization with historical texts. In this task, historical language variants (for example, ancient Chinese versus modern Chinese, or Attic Greek to modern Greek) are treated as distinct languages. The specific experiment uses word embeddings alongside extractive and abstractive summarization and transfer learning methods to produce modern summarizations of ancient-language documents.19
1 Juan-Manuel Torres-Moreno, Automatic Text Summarization, Wiley, 2014.
2 Aggarwal, Machine Learning for Text, Springer. Bettina Berendt, “Text Mining for News and Blogs Analysis,” Encyclopedia of Machine Learning and Data Science, Springer, 2020.
3 Haopeng Zhang, Xiao Liu, and Jiawei Zhang, “Extractive Summarization via ChatGPT for Faithful Summary Generation,” Findings of the Association for Computational Linguistics: EMNLP 2023, https://aclanthology.org/2023.findings-emnlp.214/ (link resides outside ibm.com).
4 Meng Cao, Yue Dong, and Jackie Cheung, “Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization,” Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022, https://aclanthology.org/2022.acl-long.236/ (link resides outside ibm.com).
5 Jonathan Pilault, Raymond Li, Sandeep Subramanian, and Chris Pal, “On Extractive and Abstractive Neural Document Summarization with Transformer Language Models,” Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, https://aclanthology.org/2020.emnlp-main.748/ (link resides outside ibm.com).
6 Giuseppe Carenini and Jackie C. K. Cheung, “Extractive vs. NLG-based Abstractive Summarization of Evaluative Text: The Effect of Corpus Controversiality,” Proceedings of the Fifth International Natural Language Generation Conference, 2008, https://aclanthology.org/W08-1106/ (link resides outside ibm.com).
7 Ani Nenkova and Kathleen McKeown, “A Survey of Text Summarization Techniques,” Text Mining Data, Springer, 2012. Wafaa S. El-Kassas, Cherif R. Salama, Ahmed A. Rafea, and Hoda K. Mohamed, “Automatic text summarization: A comprehensive survey,” Expert Systems with Applications, 165, 2021, https://www.sciencedirect.com/science/article/abs/pii/S0957417420305030 (link resides outside ibm.com).
8 Ani Nenkova and Kathleen McKeown, “A Survey of Text Summarization Techniques,” Text Mining Data, Springer, 2012. Steven Shearing, Abigail Gertner, Benjamin Wellner, and Liz Merkhofe, “Automated Text Summarization: A Review and Recommendations,” Technical Report, MITRE Corporation, 2020.
9 Ani Nenkova and Kathleen McKeown, “A Survey of Text Summarization Techniques,” Text Mining Data, Springer, 2012.
10 Jade Goldsteiny, Mark Kantrowitz, Vibhu Mittal, and Jaime Carbonell, “Summarizing Text Documents: Sentence Selection and Evaluation Metrics,” Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, 1999, pp. 121-128, https://www.cs.cmu.edu/~jgc/publication/Summarizing_Text_Documents_Sentence_SIGIR_1999.pdf (link resides outside ibm.com).
11 Som Gupta and S.K. Gupta, “Abstractive summarization: An overview of the state of the art,” Expert Systems With Applications, 2019, https://www.sciencedirect.com/science/article/abs/pii/S0957417418307735 (link resides outside ibm.com). Wafaa S. El-Kassas, Cherif R. Salama, Ahmed A. Rafea, and Hoda K. Mohamed, “Automatic text summarization: A comprehensive survey,” Expert Systems With Applications, 2021, https://www.sciencedirect.com/science/article/abs/pii/S0957417420305030 (link resides outside ibm.com). Hui Lin and Vincent Ng, “Abstractive Summarization: A Survey of the State of the Art,” Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, No. 1, 2019, pp. 9815-9822, https://ojs.aaai.org/index.php/AAAI/article/view/5056 (link resides outside ibm.com).
12 Som Gupta and S.K. Gupta, “Abstractive summarization: An overview of the state of the art,” Expert Systems With Applications, 2019, https://www.sciencedirect.com/science/article/abs/pii/S0957417418307735 (link resides outside ibm.com). Regina Barzilay and Kathleen R. McKeown, “Sentence Fusion for Multidocument News Summarization,” Computational Linguistics, Vol. 31, No. 3, 2005, pp. 297-328, https://aclanthology.org/J05-3002/ (link resides outside ibm.com).
13 Regina Barzilay and Kathleen R. McKeown, “Sentence Fusion for Multidocument News Summarization,” Computational Linguistics, Vol. 31, No. 3, 2005, pp. 297-328, https://aclanthology.org/J05-3002/ (link resides outside ibm.com).
14 Peng Cui and Le Hu, “Topic-Guided Abstractive Multi-Document Summarization,” Findings of the Association for Computational Linguistics: EMNLP 2021, https://aclanthology.org/2021.findings-emnlp.126/ (link resides outside ibm.com).
15 Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu, “Bleu: a Method for Automatic Evaluation of Machine Translation,” Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002, https://aclanthology.org/P02-1040/ (link resides outside ibm.com).
16 Chin-Yew Lin, “ROUGE: A Package for Automatic Evaluation of Summaries,” Text Summarization Branches Out, https://aclanthology.org/W04-1013/ (link resides outside ibm.com).
17 Soheil Esmaeilzadeh, Gao Xian Peh, and Angela Xu, “Neural Abstractive Text Summarization and Fake News Detection,” 2019, https://arxiv.org/abs/1904.00788 (link resides outside ibm.com). Philipp Hartl and Udo Kruschwitz, “Applying Automatic Text Summarization for Fake News Detection,” Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022, https://aclanthology.org/2022.lrec-1.289/ (link resides outside ibm.com).
18 Jiaan Wang, Fandong Meng, Duo Zheng, Yunlong Liang, Zhixu Li, Jianfeng Qu, and Jie Zhou, “A Survey on Cross-Lingual Summarization,” Transactions of the Association for Computational Linguistics, Vol. 10, 2022, https://aclanthology.org/2022.tacl-1.75/ (link resides outside ibm.com).
19 Xutan Peng, Yi Zheng, Chenghua Lin, and Advaith Siddharthan, “Summarising Historical Text in Modern Languages,” Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, 2021, https://aclanthology.org/2021.eacl-main.273/ (link resides outside ibm.com).
IBM® Granite™ is our family of open, performant and trusted AI models, tailored for business and optimized to scale your AI applications. Explore language, code, time series and guardrail options.
Discover how natural language processing can help you to converse more naturally with computers.
We surveyed 2,000 organizations about their AI initiatives to discover what's working, what's not and how you can get ahead.
Explore IBM Developer's website to access blogs, articles, newsletters and learn more about IBM embeddable AI.
Learn fundamental concepts and build your skills with hands-on labs, courses, guided projects, trials and more.
Easily design scalable AI assistants and agents, automate repetitive tasks and simplify complex processes with IBM® watsonx™ Orchestrate®.
Accelerate the business value of artificial intelligence with a powerful and flexible portfolio of libraries, services and applications.
Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.