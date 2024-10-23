Code summarization is the process of generating a natural language description of snippets of code. Common code summarization tasks are exploring a new code base, learning a new programming language or generate code comments and explanations of functions. Generating a summarization of a snippet of code is similar to generating a text summarization of a natural language document. It differs because the large language model (LLM) generating the summary needs to understand the programming language that it's reading through while identifying the underlying logic of what it's trying to accomplish.

Code summaries are a valuable part of software development, helping with software maintenance by performing automatic source code summarization, creating natural language summaries for documentation or parsing large-scale code bases. Newer LLM models that use a transformer-based approach can act as either code summarization models or run code generation as well. These functions are possible because the models have been trained on large datasets built from sources such as Github repositories that include code and comments along with documentation for that code.

Before LLMs were popularized, approaches to code summarization required parsing code semantics and generating an abstract syntax tree (AST) of each identifier from the code that could then be used to generate documentation.1,2 With the advent of deep learning and neural networks, approaches rooted in computer science were abandoned for approaches that borrowed more from the methodology of neural machine translation3,4.

For transformer models, larger context windows lead to better results. Many of the newest state-of-the-art Granite™ Code models such as Granite-8B-Code-Instruct-128K, have a 128k context window. A larger context window allows the model to hold more text in a working memory. This helps monitor key moments and details in a drawn-out chat, or a lengthy document or codebase. This working memory enables an LLM-based chatbot to generate responses that make sense in the immediate moment and over a longer context, helping them outperform models with smaller context windows both by human evaluation and evaluation metrics.5

When ChatGPT was first introduced its context window was 4,000 tokens. If your conversation went over the 3,000-word chat-interface limit, the chatbot was likely to hallucinate and veer off-topic. Today, the standard is 32,000 tokens, with the industry shifting to 128,000 tokens. That is about the length of a 250-page book. IBM now has two Granite models with a 128,000-token window and more are on their way.