Prompting the granite-13b-chat-v2 foundation model from IBM
Experiment with inferencing the IBM granite-13b-chat-v2 foundation model in watsonx.ai to see how this versatile foundation model can help you accomplish various tasks.
The granite-13b-chat-v2 foundation model is a 13 billion parameter large language model that is designed to help business users get work done. The IBM Granite series of foundation models are trained on enterprise-specialized datasets, which means the models are familiar with the specialized language and jargon from various industries and can more readily generate content that is grounded in relevant industry knowledge.
For more information about the granite-13b-chat-v2 foundation model, such as version number and instruction-tuning details, see the model card.
The granite-13b-chat-v2 foundation model is optimized for the following use cases:
Because the model can be used for different use cases, to get the best results you need to adjust the model parameters and system prompt.
Conversing with Granite Chat
To get the best results when chatting with the granite-13b-chat-v2 foundation model, first follow these recommendations and then experiment to get the results that you want.
The following table lists the recommended model parameters for prompting the granite-13b-chat-v2 foundation model for a conversational task.
Parameter | Recommended value or range | Explanation |
---|---|---|
Decoding | Sampling | Sampling decoding generates more creative text, which helps to add interest and personality to responses from the chatbot. However, it can also lead to unpredictable output. You can control the degree of creativity with the next set of model parameters. |
Top P: 0.85 Top K: 50 Temperature: 0.7 |
These sampling decoding parameters all work together. The model selects a subset of tokens from which to choose the token to include in the output. The subset includes the 50 most-probable tokens (Top K) or the tokens that, when their probability scores are summed, reach a total score of 0.85 (Top P). The relatively low temperature value of 0.7 amplifies the difference in token scores. As a result the tokens that make the cut are typically the most probable tokens. • To increase the creativity and diversity of responses, increase the temperature value. • If the model hallucinates, lower the temperature value. |
|
Repetition penalty | 1.05 | Set the penalty to this low value to prevent the chatbot from sounding robotic by repeating words or phrases. |
Random seed | – | Only specify a value for this setting if you are testing something and want to remove randomness as a factor from the test. For example, if you want to change the temperature to see how that affects the output, submit the same input repeatedly and change only the temperature value each time. Also specify a number, such as 5 , as the random seed each time to eliminate random token choices from also affecting the model output. The number itself doesn't
matter, as long as you specify the same number each time. |
Max tokens | 900 | The maximum context window length for the granite-13b-chat-v2 foundation model, which includes both input and output tokens, is 8192. For more information about tokens, see Tokens and tokenization. With each follow-up question, the conversation history is included as part of the model prompt. The granite-13b-chat-v2 foundation model can typically sustain a conversation for up to 5 turns or until the input reaches 4,000 tokens in length. |
For more information about the model parameters, see Model parameters for prompting.
To prompt the granite-13b-chat-v2 foundation model for a chat task, try these steps:
-
From the Prompt Lab in chat mode, choose the granite-13b-chat-v2 foundation model.
Chat mode has default prompt parameter values that are optimized for conversational exchanges, including a higher Max tokens value.
-
From the Model parameters panel, apply the recommended model parameter values from table 1.
-
Add instructions to the system prompt.
For example, the following instruction text was used to train the model, and therefore is familiar to the model.
You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior. You always respond to greetings (for example, hi, hello, g'day, morning, afternoon, evening, night, what's up, nice to meet you, sup) with "Hello! I am Granite Chat, created by IBM. How can I help you today?". Please do not say anything else and do not start a conversation.
Click the Copy to clipboard icon from the following code snippet to copy the recommended system prompt text.
You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior. You always respond to greetings (for example, hi, hello, g'day, morning, afternoon, evening, night, what's up, nice to meet you, sup) with "Hello! I am Granite Chat, created by IBM. How can I help you today?". Please do not say anything else and do not start a conversation.
-
Start a conversation.
The optimal structure for a prompt that is used for a chat task follows. If you submit prompts from the Prompt Lab in freeform mode, use this prompt format.
<|system|> You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior. You always respond to greetings (for example, hi, hello, g'day, morning, afternoon, evening, night, what's up, nice to meet you, sup) with "Hello! I am Granite Chat, created by IBM. How can I help you today?". Please do not say anything else and do not start a conversation. <|user|> {PROMPT} <|assistant|>
-
Ask follow-up questions to keep the conversation going.
The optimal structure for a prompt that is used for a chat with multiple dialog turns follows. If you submit prompts from the Prompt Lab in freeform mode, use this prompt format.
<|system|> You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior. You always respond to greetings (for example, hi, hello, g'day, morning, afternoon, evening, night, what's up, nice to meet you, sup) with "Hello! I am Granite Chat, created by IBM. How can I help you today?". Please do not say anything else and do not start a conversation. <|user|> {ROUND1_PROMPT} <|assistant|> {MODEL_RESPONSE} <|user|> {ROUND2_PROMPT} <|assistant|>
For another sample prompt that illustrates the chat use case, see Sample: Converse with granite-13b-chat-v2.
For more information about using chat mode in Prompt Lab, see Prompt Lab.
Tips for prompting Granite Chat for conversational tasks
-
In follow-up questions, avoid pronouns. For example, ask “What does the variable represent?” instead of “What does it represent?”
-
If you want the foundation model to generate a response in table format, explicitly ask in the prompt for the model to return a table that is generated in markdown.
Returning factual answers with the RAG pattern
To guide the granite-13b-chat-v2 foundation model to return factual answers, use the retrieval-augmented generation pattern. Retrieval-augmented generation grounds the input that you submit to the model with factual information about the topic to be discussed. For more information, see Retrieval-augmented generation (RAG).
When you want to return factual answers from the granite-13b-chat-v2 foundation model, follow these recommendations.
The following table lists the recommended model parameters for prompting the granite-13b-chat-v2 foundation model for a retrieval-augmented generation task.
Parameter | Recommended value or range | Explanation |
---|---|---|
Decoding | Greedy | Greedy decoding chooses tokens from only the most-probable options, which is best when you want factual answers. |
Repetition penalty | 1 | Use the lowest value. Repetition is acceptable when the goal is factual answers. |
Max tokens | 500 | The model can answer the question as completely as possible. Remember that the maximum context window length for the granite-13b-chat-v2 foundation model, which includes both input and output tokens, is 8192. Keep your input, including the document that you add to ground the prompt, within that limit. For more information about tokens, see Tokens and tokenization. |
Stopping criteria | <|endoftext|> | A helpful feature of the granite-13b-chat-v2 foundation model is the inclusion of a special token that is named <|endoftext|> at the end of each response. When some generative models return a response to the input in fewer tokens than the maximum number allowed, they can repeat patterns from the input. This model prevents such repetition by incorporating a reliable stop sequence for the prompt. |
For more information about the model parameters, see Model parameters for prompting.
To prompt the granite-13b-chat-v2 foundation model for a retrieval-augmented generation task, try these steps:
-
Find reliable resources with factual information about the topic that you want the model to discuss and that you have permission to use. Copy an excerpt of the document or documents to a text editor or other tool where you can access it later.
For example, the resource might be product information from your own company website or product documentation.
-
From the Prompt Lab, open freeform mode so that you can structure your prompts. Choose the granite-13b-chat-v2 foundation model.
-
From the Model parameters panel, set the recommended model parameters from table 2.
-
In your prompt, clearly define the system prompt, user input, and where the model's output should go.
For example, the following prompt structure and text was used to train the model, and therefore is familiar to the model.
<|system|> You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior. <|user|> You are an AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, meaning ensure that your response is correct given the context and user query, and that it is grounded in the context. Furthermore, make sure that the response is supported by the given document or context. When the question cannot be answered using the context or document, output the following response: '{RESPONSE_MESSAGE}' Always make sure that your response is relevant to the question. If an explanation is needed, first provide the explanation or reasoning, and then give the final answer. Answer length: {ANSWER_LENGTH} [Document] {DOCUMENT1_TITLE} {DOCUMENT1_CONTENT} [End] [Document] {DOCUMENT2_TITLE} {DOCUMENT2_CONTENT} [End] [Document] {DOCUMENT3_TITLE} {DOCUMENT3_CONTENT} [End] {QUERY} <|assistant|>
Note: The start and end of the document content is denoted by the special tags
[Document]
and[End]
. Use a similar syntax if you want to add special tags that identify content types or subsection headers in your prompts. When the granite-13b-chat-v2 foundation model was created, it was trained to handle the following special tags:<|system|>
<|user|>
<|assistant|>
Do not use the same
<|tagname|>
syntax for your custom tags or you might confuse the model. -
If you do copy this prompt template, after you paste it into the Prompt Lab editor, replace the placeholder variables.
Table 2a: RAG template placeholder variables Placeholder variable Description Examples {RESPONSE_MESSAGE}
Sentence (or a short variable that can be replaced with a sentence by a post-processing step) that indicates that there is no valid answer to the question. “I don't know.” {ANSWER_LENGTH}
Optional. Defines the expected response length for the answer. Options include (from shortest to longest answers): single word
,concise
,narrative
{DOCUMENTn_TITLE}
Title of the document from which the excerpt with factual information is taken. You can include content from more than one document. Product Brochure {DOCUMENTn_CONTENT}
Text excerpt with the factual information that you want the model to be able to discuss knowledgeably. Text from a marketing brochure, product documentation, company website, or other trusted resource. {QUERY}
Question to be answered factually. A question about the topic that is discussed in the document. Tip: Alternatively, you can define and use a prompt variable for the document so that the prompt can be reused and the content can be replaced dynamically each time. For more information, see Building reusable prompts.
Retrieval-augmented generation prompt example 1
The following prompt uses the granite-13b-chat-v2 foundation model to answer questions about prompt tuning.
Note: The document content is taken from the Methods for tuning foundation models topic.
<|system|>
You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.
<|user|>
You are an AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, meaning ensure that your response is correct given the context and user query, and that it is grounded in the context.
Furthermore, make sure that the response is supported by the given document or context. When the question cannot be answered using the context or document, output the following response: 'I don't know.' Always make sure that your response is relevant to the question. If an explanation is needed, first provide the explanation or reasoning, and then give the final answer.
Answer length: concise
[Document]
How prompt-tuning works
Foundation models are sensitive to the input that you give them. Your input, or how you prompt the model, can introduce context that the model will use to tailor its generated output. Prompt engineering to find the right prompt often works well. However, it can be time-consuming, error-prone, and its effectiveness can be restricted by the context window length that is allowed by the underlying model.
Prompt-tuning a model in the Tuning Studio applies machine learning to the task of prompt engineering. Instead of adding words to the input itself, prompt-tuning is a method for finding a sequence of values that, when added as a prefix to the input text, improve the model's ability to generate the output you want. This sequence of values is called a prompt vector.
Normally, words in the prompt are vectorized by the model. Vectorization is the process of converting text to tokens, and then to numbers defined by the model's tokenizer to identify the tokens. Lastly, the token IDs are encoded, meaning they are converted into a vector representation, which is the input format that is expected by the embedding layer of the model. Prompt-tuning bypasses the model's text-vectorization process and instead crafts a prompt vector directly. This changeable prompt vector is concatenated to the vectorized input text and the two are passed as one input to the embedding layer of the model. Values from this crafted prompt vector affect the word embedding weights that are set by the model and influence the words that the model chooses to add to the output.
To find the best values for the prompt vector, you run a tuning experiment. You demonstrate the type of output that you want for a corresponding input by providing the model with input and output example pairs in training data. With each training run of the experiment, the generated output is compared to the training data output. Based on what it learns from differences between the two, the experiment adjusts the values in the prompt vector. After many runs through the training data, the model finds the prompt vector that works best.
[End]
How does prompt tuning work?
<|assistant|>
The following screenshot shows the output that is generated by the granite-13b-chat-v2 foundation model when this prompt is submitted by using the Prompt Lab. The output is an accurate summary of how prompt tuning works that is based on information from the document content that was provided with the input:
“Prompt tuning is a method for improving a model's ability to generate output by finding a sequence of values, called a prompt vector, that, when added as a prefix to the input text, influences the model's word embedding weights. This is done by running a tuning experiment, where input and output example pairs are provided to the model, and the model's generated output is compared to the training data output. The experiment then adjusts the values in the prompt vector based on the differences between the two. Over many runs through the training data, the model finds the prompt vector that works best.”
Retrieval-augmented generation prompt example 2
The following prompt uses the granite-13b-chat-v2 foundation model to answer questions based on information from four documents.
<|system|>\nYou are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.
<|user|>\nYou are an AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, meaning ensure that your response is correct given the context and user query, and that it is grounded in the context.
Furthermore, make sure that the response is supported by the given document or context. When the question cannot be answered using the context or document, output the following response: 'I don't know.' Always make sure that your response is relevant to the question. If an explanation is needed, first provide the explanation or reasoning, and then give the final answer.
Answer length: concise
[Document]
New York
New York, often called New York City or simply NYC, is the most populous city in the United States, located at the southern tip of New York State on one of the world's largest natural harbors. The city comprises five boroughs, each of which is coextensive with a respective county. It is a global city and a cultural, financial, high-tech, entertainment, and media center with a significant influence on commerce, health care, scientific output, life sciences, research, technology, education, politics, tourism, dining, art, fashion, and sports. Home to the headquarters of the United Nations, New York is an important center for international diplomacy and is sometimes described as the world's most important city and the capital of the world.
With an estimated population in 2022 of 8,335,897 distributed over 300.46 square miles, the city is the most densely populated major city in the United States. New York has more than double the population of Los Angeles, the nation's second-most populous city. New York is the geographical and demographic center of both the Northeast megalopolis and the New York metropolitan area, the largest metropolitan area in the U.S. by both population and urban area. With more than 20.1 million people in its metropolitan statistical area and 23.5 million in its combined statistical area as of 2020, New York City is one of the world's most populous megacities. The city and its metropolitan area are the premier gateway for legal immigration to the United States. As many as 800 languages are spoken in New York, making it the most linguistically diverse city in the world. In 2021, the city was home to nearly 3.1 million residents born outside the U.S., the largest foreign-born population of any city in the world.
New York City traces its origins to Fort Amsterdam and a trading post founded on the southern tip of Manhattan Island by Dutch colonists in approximately 1624. The settlement was named New Amsterdam in 1626 and was chartered as a city in 1653. The city came under English control in 1664 and was renamed New York after King Charles II granted the lands to his brother, the Duke of York. The city was temporarily regained by the Dutch in July 1673 and was renamed New Orange; however, the city has been named New York since November 1674. New York City was the capital of the United States from 1785 until 1790. The modern city was formed by the 1898 consolidation of its five boroughs: Manhattan, Brooklyn, Queens, The Bronx, and Staten Island, and has been the largest U.S. city ever since.
Anchored by Wall Street in the Financial District of Lower Manhattan, New York City has been called both the world's premier financial and fintech center and the most economically powerful city in the world. As of 2022, the New York metropolitan area is the largest metropolitan economy in the world with a gross metropolitan product of over US$2.16 trillion. If the New York metropolitan area were its own country, it would have the tenth-largest economy in the world. The city is home to the world's two largest stock exchanges by market capitalization of their listed companies: the New York Stock Exchange and Nasdaq. New York City is an established safe haven for global investors. As of 2023, New York City is the most expensive city in the world for expatriates to live. New York City is home to the highest number of billionaires, individuals of ultra-high net worth, and millionaires of any city in the world.
[End]
[Document]
Los Angeles
Los Angeles (Spanish: Los Ángeles, lit. 'The Angels'), often referred to by its initials L.A., is the most populous city in the U.S. state of California and in the Western United States. With roughly 3.9 million residents within the city limits as of 2020, Los Angeles is the second-most populous city in the United States, behind only New York City; it is also the commercial, financial and cultural center of Southern California. Los Angeles has a Mediterranean climate and an ethnically and culturally diverse population, and it is the principal city of a metropolitan area of 13.2 million people. Greater Los Angeles, which includes the Los Angeles and Riverside–San Bernardino metropolitan areas, is a sprawling metropolis of over 18 million residents.
The majority of the city proper lies in a basin in Southern California adjacent to the Pacific Ocean in the west and extending partly through the Santa Monica Mountains and north into the San Fernando Valley, with the city bordering the San Gabriel Valley to its east. It covers about 469 square miles, and is the county seat of Los Angeles County, which is the most populous county in the United States with an estimated 9.86 million residents as of 2022. It is the fourth-most visited city in the U.S. with over 2.7 million visitors as of 2022.
The area that became Los Angeles was originally inhabited by the indigenous Tongva people and later claimed by Juan Rodríguez Cabrillo for Spain in 1542. The city was founded on September 4, 1781, under Spanish governor Felipe de Neve, on the village of Yaanga. It became a part of Mexico in 1821 following the Mexican War of Independence. In 1848, at the end of the Mexican–American War, Los Angeles and the rest of California were purchased as part of the Treaty of Guadalupe Hidalgo and became part of the United States. Los Angeles was incorporated as a municipality on April 4, 1850, five months before California achieved statehood. The discovery of oil in the 1890s brought rapid growth to the city. The city was further expanded with the completion of the Los Angeles Aqueduct in 1913, which delivers water from Eastern California.
Los Angeles has a diverse economy with a broad range of industries. Despite a post-COVID-19 pandemic exodus of entertainment production and talent, Los Angeles is still best known as the home of the Hollywood film industry, the world's largest by revenue; the city was an important site in the history of film. It also has one of the busiest container ports in the Americas. In 2018, the Los Angeles metropolitan area had a gross metropolitan product of over $1.0 trillion, making it the city with the third-largest GDP in the world, after New York and Tokyo. Los Angeles hosted the Summer Olympics in 1932 and 1984, and will also host in 2028. More recently, statewide droughts in California have strained both the city's and Los Angeles County's water security.
[End]
[Document]
Chicago
Chicago is the most populous city in the U.S. state of Illinois and the third-most populous in the United States after New York City and Los Angeles. With a population of 2,746,388 in the 2020 census, it is also the most populous city in the Midwest. As the seat of Cook County, the second-most populous county in the U.S., Chicago is the center of the Chicago metropolitan area.
Located on the shore of Lake Michigan, Chicago was incorporated as a city in 1837 near a portage between the Great Lakes and the Mississippi River watershed. It grew rapidly in the mid-19th century. In 1871, the Great Chicago Fire destroyed several square miles and left more than 100,000 homeless, but Chicago's population continued to grow. Chicago made noted contributions to urban planning and architecture, such as the Chicago School, the development of the City Beautiful Movement, and the steel-framed skyscraper.
Chicago is an international hub for finance, culture, commerce, industry, education, technology, telecommunications, and transportation. It has the largest and most diverse derivatives market in the world, generating 20% of all volume in commodities and financial futures alone. O'Hare International Airport is routinely ranked among the world's top six busiest airports by passenger traffic, and the region is also the nation's railroad hub. The Chicago area has one of the highest gross domestic products (GDP) in the world, generating $689 billion in 2018. Chicago's economy is diverse, with no single industry employing more than 14% of the workforce.
Chicago is a major tourist destination. Chicago's culture has contributed much to the visual arts, literature, film, theater, comedy (especially improvisational comedy), food, dance, and music (particularly jazz, blues, soul, hip-hop, gospel, and electronic dance music, including house music). Chicago is home to the Chicago Symphony Orchestra and the Lyric Opera of Chicago, while the Art Institute of Chicago provides an influential visual arts museum and art school. The Chicago area also hosts the University of Chicago, Northwestern University, and the University of Illinois Chicago, among other institutions of learning. Chicago has professional sports teams in each of the major professional leagues, including two Major League Baseball teams.
[End]
[Document]
Houston
Houston is the most populous city in the U.S. state of Texas and in the Southern United States. Houston is located in Southeast Texas near Galveston Bay and the Gulf of Mexico; it is the seat and largest city of Harris County and the principal city of the Greater Houston metropolitan area, which is the fifth-most populous metropolitan statistical area in the United States and the second-most populous in Texas after Dallas–Fort Worth. With a population of 2,302,878 in 2022, Houston is the fourth-most populous city in the United States after New York City, Los Angeles, and Chicago, and the seventh-most populous city in North America. Houston is the southeast anchor of the greater megaregion known as the Texas Triangle.
Comprising a land area of 640.4 square miles, Houston is the ninth-most expansive city in the United States (including consolidated city-counties). It is the largest city in the United States by total area whose government is not consolidated with a county, parish, or borough. Though primarily in Harris County, small portions of the city extend into Fort Bend and Montgomery counties, bordering other principal communities of Greater Houston such as Sugar Land and The Woodlands.
Houston was founded by land investors on August 30, 1836, at the confluence of Buffalo Bayou and White Oak Bayou (a point now known as Allen's Landing) and incorporated as a city on June 5, 1837. The city is named after former General Sam Houston, who was president of the Republic of Texas and had won Texas's independence from Mexico at the Battle of San Jacinto 25 miles east of Allen's Landing. After briefly serving as the capital of the Texas Republic in the late 1830s, Houston grew steadily into a regional trading center for the remainder of the 19th century.
The arrival of the 20th century brought a convergence of economic factors that fueled rapid growth in Houston, including a burgeoning port and railroad industry, the decline of Galveston as Texas's primary port following a devastating 1900 hurricane, the subsequent construction of the Houston Ship Channel, and the Texas oil boom. In the mid-20th century, Houston's economy diversified, as it became home to the Texas Medical Center—the world's largest concentration of healthcare and research institutions—and NASA's Johnson Space Center, home to the Mission Control Center.
Since the late 19th century Houston's economy has had a broad industrial base, in energy, manufacturing, aeronautics, and transportation. Leading in healthcare sectors and building oilfield equipment, Houston has the second-most Fortune 500 headquarters of any U.S. municipality within its city limits (after New York City). The Port of Houston ranks first in the United States in international waterborne tonnage handled and second in total cargo tonnage handled.
Nicknamed the "Bayou City", "Space City", "H-Town", and "the 713", Houston has become a global city, with strengths in culture, medicine, and research. The city has a population from various ethnic and religious backgrounds and a large and growing international community. Houston is the most diverse metropolitan area in Texas and has been described as the most racially and ethnically diverse major city in the U.S. It is home to many cultural institutions and exhibits, which attract more than seven million visitors a year to the Museum District. The Museum District has nineteen museums, galleries, and community spaces. Houston has an active visual and performing arts scene in the Theater District and offers year-round resident companies in all major performing arts.
[End]
What should I do when I visit Chicago?
<|assistant|>
The model responses might vary, but they include information that comes directly from the Chicago document that was included in the prompt.
The Art Institute of Chicago, while providing an influential
visual arts museum and art school, is just one of the many
cultural institutions and exhibits in Chicago that attract
millions of visitors each year. The city has a diverse range
of attractions, including professional sports teams in each
of the major professional leagues, such as two Major League
Baseball teams. You might also want to explore the city's
rich history, architecture, and culinary scene.
NLP tasks
You can use the granite-13b-chat-v2 foundation model for natural language processing tasks such as classification and extraction. When the granite-13b-chat-v2 model was created, it was trained to perform these types of natural language processing tasks.
During the model's training, the prompts that were used to train the model followed a specific format per task type. When you use the model to perform one of these tasks, mimic the established format in the prompts that you submit.
Classification
To use the granite-13b-chat-v2 foundation model to classify information, follow these recommendations.
The following table lists the recommended model parameters for prompting the granite-13b-chat-v2 foundation model for a classification task.
Parameter | Recommended value or range | Explanation |
---|---|---|
Decoding | Greedy | Greedy decoding chooses tokens from only the most-probable options, which is best when you want to classify text. |
Repetition penalty | 1 | Use the lowest value. Repetition is expected. |
Max tokens | varies | Use a value that covers the number of tokens in your longest class label, such as 5 or 10 . Limiting the tokens encourages the model to return only the appropriate class label and nothing else. |
Stopping criteria | Add each supported class label as a stop sequence. | Adding the classes as stop sequences forces the model to stop generating text after a class is assigned to the input. |
To prompt the granite-13b-chat-v2 foundation model for a classification task, try these steps:
-
Identify the classes or classification labels that you want the model to assign to the input. Be sure to list these class labels in the instruction segment of your prompt.
For example, if you want to classify customer product reviews as positive or negative, you might define two class labels:
Postitive
andNegative
. -
Collect two or three representative examples of the type of input text that you want the model to classify.
-
Work with the granite-13b-chat-v2 foundation model from the Prompt Lab in freeform mode so that you can structure your prompts.
-
From the Model parameters panel, set the recommended model parameters from table 3.
-
In your prompt, clearly identify the system prompt, user input, and where the model's output should go.
For example, the following prompt structure was used when the granite-13b-chat-v2 foundation model was trained to classify text:
<|system|> You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior. <|user|> {INSTRUCTION} Your response should only include the answer. Do not provide any further explanation. Here are some examples, complete the last one: {INPUT_LABEL}: {ICL_INPUT_1} {OUTPUT_LABEL}: {ICL_OUTPUT_1} {INPUT_LABEL}: {ICL_INPUT_2} {OUTPUT_LABEL}: {ICL_OUTPUT_2} {INPUT_LABEL}: {TEST_INPUT} {OUTPUT_LABEL}: <|assistant|>
You can use a similar structure to leverage the model's training. Simply replace the placeholder variables in the prompt template.
Table 3a: Classification template placeholder variables Placeholder variable Description Examples {INSTRUCTION}
Description of the task. Include a list of the classes that you want the model to assign to the input. For each product review, indicate whether the review is Positive or Negative. {INPUT_LABEL}
Short label for the text to be classified. Input
,Customer review
,Feedback
,Comment
{OUTPUT_LABEL}
Short label that represents the classification value. Class
{ICL_INPUT_N}
Optional. Examples of input text to be classified. Add examples when you want to use a few-shot prompt to support in-context learning. The service representative did not listen to a word I said. It was a waste of my time.
{ICL_OUTPUT_N}
Example outputs with class labels assigned to the corresponding input text examples. Positive
,Negative
Classification prompt example
The following prompt classifies feedback that customers share about support center personnel.
<|system|>
You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.
<|user|>
For each feedback, specify whether the content is Positive or Negative. Your response should only include the answer. Do not provide any further explanation.
Here are some examples, complete the last one:
Feedback:
Carol, the service rep was so helpful. She answered all of my questions and explained things beautifully.
Class:
Positive
Feedback:
The service representative did not listen to a word I said. It was a waste of my time.
Class:
Negative
Feedback:
Carlo was so helpful and pleasant. He was able to solve a problem that I've been having with my software for weeks now.
Class:
<|assistant|>
The output that is generated by the granite-13b-chat-v2 foundation model when this prompt is submitted is Positive
.
Extraction
To use the granite-13b-chat-v2 foundation model to extract information, follow these recommendations.
The following table lists the recommended model parameters for prompting the granite-13b-chat-v2 foundation model for an extraction task.
Parameter | Recommended value or range | Explanation |
---|---|---|
Decoding | Greedy | Greedy decoding chooses tokens from only the most-probable options, which is best when you want to extract text. |
Max tokens | varies | Use a value that covers the number of tokens in the longest mention of the information type that you want to extract, such as 5 or 10 . Limiting the tokens encourages the model to return only the appropriate
class label and nothing else. |
Stopping criteria | Add each supported class label as a stop sequence. | Adding the classes as stop sequences forces the model to stop generating text after a class is assigned to the input. |
To prompt the granite-13b-chat-v2 foundation model for an extraction task, try these steps:
-
Identify the information types that you want the model to extract from the input. Be sure to list these information type labels in the instruction segment of your prompt.
For example, if you want to extract key pieces of information from a company's US Securities and Exchange Commission 10-K form, you might identify an information type such as a
Line Of Credit Facility Maximum Borrowing Capacity
. -
Collect 2 or 3 representative examples of input text with the type of information that you want the model to extract.
-
Work with the granite-13b-chat-v2 foundation model from the Prompt Lab in freeform mode so that you can structure your prompts.
-
From the Model parameters panel, set the recommended model parameters from table 4.
-
Clearly identify the system prompt, user input, and where the model's output should go.
For example, the following prompt structure was used when the granite-13b-chat-v2 foundation model was trained to extract information from text:
<|system|> You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior. <|user|> {INSTRUCTION} Your response should only include the answer. Do not provide any further explanation. Here are some examples, complete the last one: {INPUT_LABEL}: {ICL_INPUT_1} {OUTPUT_LABEL}: {ICL_OUTPUT_1} {INPUT_LABEL}: {ICL_INPUT_2} {OUTPUT_LABEL}: {ICL_OUTPUT_2} {INPUT_LABEL}: {TEST_INPUT} {OUTPUT_LABEL}: <|assistant|>
You can use a similar structure to leverage the model's training. Simply replace the placeholder variables in the prompt template.
Table 4a: Extraction template placeholder variables Placeholder variable Description {INSTRUCTION}
Description of the task. Include a list of the information types that you want the model to extract from the input. {INPUT_LABEL}
Short label for the text to be classified. {OUTPUT_LABEL}
Short label that represents the extracted value. {ICL_INPUT_N}
Optional. Examples of input text with information types to be extracted. Add examples when you want to use a few-shot prompt to support in-context learning. {ICL_OUTPUT_N}
Example outputs with information types extracted from the corresponding inputs.
Extraction prompt example
The following prompt extracts the Line Of Credit Facility Maximum Borrowing Capacity value from a company's SEC 10-K form.
<|system|>
You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.
<|user|>
Extract the Line Of Credit Facility Maximum Borrowing Capacity from the 10K sentences.
Your response should only include the answer. Do not provide any further explanation.
Here are some examples, complete the last one:
10K Sentence:
The credit agreement also provides that up to $500 million in commitments may be used for letters of credit.
Line Of Credit Facility Maximum Borrowing Capacity:
$500M
10K Sentence:
In March 2020, we upsized the Credit Agreement by $100 million, which matures July 2023, to $2.525 billion.
Line Of Credit Facility Maximum Borrowing Capacity:
$2.525B
10K Sentence:
We prepared our impairment test as of October 1, 2022 and determined that the fair values of each of our reporting units exceeded net book value by more than 50%. Among our reporting units, the narrowest difference between the calculated fair value and net book value was in our Principal Markets segment's Canada reporting unit, whose calculated fair value exceeded its net book value by 53%. Future developments related to macroeconomic factors, including increases to the discount rate used, or changes to other inputs and assumptions, including revenue growth, could reduce the fair value of this and/or other reporting units and lead to impairment. There were no goodwill impairment losses recorded for the nine months ended December 31, 2022. Cumulatively, the Company has recorded $469 million in goodwill impairment charges within its former EMEA ($293 million) and current United States ($176 million) reporting units. Revolving Credit Agreement In October 2021, we entered into a $3.15 billion multi-currency revolving credit agreement (the "Revolving Credit Agreement") for our future liquidity needs. The Revolving Credit Agreement expires, unless extended, in October 2026. Interest rates on borrowings under the Revolving Credit Agreement are based on prevailing market interest rates, plus a margin, as further described in the Revolving Credit Agreement. The total expense recorded by the Company for the Revolving Credit Agreement was not material in any of the periods presented. We may voluntarily prepay borrowings under the Revolving Credit Agreement without premium or penalty, subject to customary "breakage" costs. The Revolving Credit Agreement includes certain customary mandatory prepayment provisions. Interest on Debt Interest expense for the three and nine months ended December 31, 2022 was $27 million and $65 million, compared to $18 million and $50 million for the three and nine months ended December 31, 2021. Most of the interest for the pre-Separation period presented in the historical Consolidated Income Statement reflects the allocation of interest expense associated with debt issued by IBM from which a portion of the proceeds benefited Kyndryl.
Line Of Credit Facility Maximum Borrowing Capacity:
<|assistant|>
The output that is generated by the granite-13b-chat-v2 foundation model when this prompt is submitted is $3.15B
.
Learn more
To learn more about the granite-13b-chat-v2 foundation model, read the following resources:
Parent topic: IBM foundation models