IBM Cloud Pak® for Data Version 4.8 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for Data Version 4.X.
Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.8 reaches end of support. For more information, see Upgrading from IBM Cloud Pak for Data Version 4.8 to IBM Software Hub Version 5.1.
Supported foundation models available with watsonx.ai
A collection of open source and IBM foundation models are deployed in IBM watsonx.ai. You can prompt the foundation models in the Prompt Lab or programmatically.
The following models are available to be deployed in watsonx.ai:
- granite-13b-chat-v2
- granite-13b-chat-v1
- granite-13b-instruct-v2
- granite-13b-instruct-v1
- granite-8b-japanese
- granite-20b-multilingual
- codellama-34b-instruct
- elyza-japanese-llama-2-7b-instruct
- flan-t5-xl-3b
- flan-t5-xxl-11b
- flan-ul2-20b
- gpt-neox-20b
- jais-13b-chat
- llama-2-13b-chat
- llama-2-70b-chat
- llama2-13b-dpo-v7
- mixtral-8x7b-instruct-v01-q
- mpt-7b-instruct2
- mt0-xxl-13b
- starcoder-15.5b
To understand how the model provider, instruction tuning, token limits, and other factors can affect which model you choose, see Choosing a model.
IBM foundation models
The following table lists the supported foundation models that IBM provides for inferencing. The foundation models must be deployed in your cluster by an administrator to be available for use. All IBM models are instruction-tuned.
| Model name | Provider | IBM indemnification | Maximum tokens Context (input + output) |
Supported tasks | More information |
|---|---|---|---|---|---|
| granite-13b-chat-v2 | IBM | Yes | 8192 | • classification • extraction • generation • question answering • summarization |
• Model card • Website • Research paper |
| granite-13b-chat-v1 (Deprecated in 4.8.4) | IBM | Yes | 8192 | • classification • extraction • generation • question answering • summarization |
• Model card • Website • Research paper |
| granite-13b-instruct-v2 | IBM | Yes | 8192 | • classification • extraction • generation • question answering • summarization |
• Model card • Website • Research paper |
| granite-13b-instruct-v1 (Deprecated in 4.8.4) | IBM | Yes | 8192 | • classification • extraction • generation • question answering • summarization |
• Model card • Website • Research paper |
| granite-8b-japanese | IBM | Yes | 8192 | • classification • extraction • generation • question answering • summarization |
• Model card • Website • Research paper |
| granite-20b-multilingual | IBM | Yes | 8192 | • classification • extraction • generation • question answering • summarization |
• Model card • Website • Research paper |
Third-party foundation models
The following table lists the supported foundation models that third parties provide through Hugging Face. The foundation models must be deployed in your cluster by an administrator to be available for use. IBM indemnification does not apply to any third-party models.
| Model name | Provider | Maximum tokens Context (input + output) |
Supported tasks | More information |
|---|---|---|---|---|
| codellama-34b-instruct | Code Llama | 4096 | • code | • Model card • Meta AI Blog |
| elyza-japanese-llama-2-7b-instruct | ELYZA, Inc | 4096 | • classification • extraction • generation • question answering • retrieval augmented generation • summarization • translation |
• Model card • Blog on note.com |
| flan-t5-xl-3b | 4096 | • classification • extraction • generation • question answering • retrieval augmented generation • summarization |
• Model card • Research paper • Can be tuned in Tuning Studio |
|
| flan-t5-xxl-11b | 4096 | • classification • extraction • generation • question answering • retrieval augmented generation • summarization |
• Model card • Research paper |
|
| flan-ul2-20b | 4096 | • classification • extraction • generation • question answering • retrieval augmented generation • summarization |
• Model card • UL2 research paper • Flan research paper |
|
| gpt-neox-20b (Deprecated in 4.8.4) | EleutherAI | 8192 | • classification • generation • summarization |
• Model card • Research paper |
| jais-13b-chat | Inception, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), and Cerebras Systems | 2048 | • classification • extraction • generation • question answering • retrieval augmented generation • summarization • translation |
• Model card • Research paper |
| llama-2-13b-chat | Meta | 4096 | • classification • code • extraction • generation • question answering • retrieval augmented generation • summarization |
• Model card • Research paper |
| llama-2-70b-chat | Meta | 4096 | • classification • code • extraction • generation • question answering • retrieval augmented generation • summarization |
• Model card • Research paper |
| llama2-13b-dpo-v7 | Meta | 4096 | • classification • code • extraction • generation • question answering • retrieval augmented generation • summarization |
• Model card • Research paper (DPO) |
| mixtral-8x7b-instruct-v01-q | Mistral AI and IBM | 32,768 | • classification • code • extraction • generation • retrieval_augmented_generation • summarization • translation |
• Model card • Research paper |
| mpt-7b-instruct2 (Deprecated in 4.8.4) | Mosaic ML and IBM | 2048 | • classification • extraction • generation • summarization |
• Model card • Website |
| mt0-xxl-13b | BigScience | 4096 | • classification • generation • question answering • summarization |
• Model card • Research paper |
| starcoder-15.5b (Deprecated in 4.8.4) | BigCode | 8192 | • code | • Model card • Research paper |
Custom foundation models
In addition to working with foundation models that are curated by IBM, you can upload and deploy your own foundation models. After the custom models are deployed and registered with watsonx.ai, you can create prompts that inference the custom models from the Prompt Lab.
To learn more about how to upload, register, and deploy a custom foundation model, see Deploying a custom foundation model.
Foundation model details
The available foundation models support a range of use cases for both natural languages and programming languages. To see the types of tasks that these models can do, review and try the sample prompts.
codellama-34b-instruct
A programmatic code generation model that is based on Llama 2 from Meta. Code Llama is fine-tuned for generating and discussing code.
When you inference this model from the Prompt Lab, disable AI guardrails.
This model was introduced with the 4.8.4 release.
Usage: Use Code Llama to create prompts that generate code based on natural language inputs, explain code, or that complete and debug code.
Size: 34 billion parameters
Token limits
- Context window length (input + output): 4096
Supported natural languages: English
Supported programming languages: The codellama-34b-instruct-hf foundation model supports many programming languages, including Python, C++, Java, PHP, Typescript (Javascript), C#, Bash, and more.
Instruction tuning information: The instruction fine-tuned version was fed natural language instruction input and the expected output to guide the model to generate helpful and safe answers in natural language.
Model architecture: Decoder
License: License
Learn more
elyza-japanese-llama-2-7b-instruct
The elyza-japanese-llama-2-7b-instruct model is provided by ELYZA, Inc on Hugging Face. The elyza-japanese-llama-2-7b-instruct foundation model is a version of the Llama 2 model from Meta that is trained to understand and generate Japanese text. The model is fine-tuned for solving various tasks that follow user instructions and for participating in a dialog.
When you inference this model from the Prompt Lab, disable AI guardrails.
This model was introduced with the 4.8.3 release.
Usage: General use with zero- or few-shot prompts. Works well for classification and extraction in Japanese and for translation between English and Japanese. Performs best when prompted in Japanese.
Try it out
Size: 7 billion parameters
Token limits: Context window length (input + output): 4096
Supported natural languages: Japanese, English
Instruction tuning information: For Japanese language training, Japanese text from many sources were used, including Wikipedia and the Open Super-large Crawled ALMAnaCH coRpus (a multilingual corpus that is generated by classifying and filtering language in the Common Crawl corpus). The model was fine-tuned on a dataset that was created by ELYZA. The ELYZA Tasks 100 dataset contains 100 diverse and complex tasks that were created manually and evaluated by humans. The ELYZA Tasks 100 dataset is publicly available from HuggingFace.
Model architecture: Decoder
License: License
Learn more
flan-t5-xl-3b
The flan-t5-xl-3b model is provided by Google on Hugging Face. This model is based on the pretrained text-to-text transfer transformer (T5) model and uses instruction fine-tuning methods to achieve better zero- and few-shot performance. The model is also fine-tuned with chain-of-thought data to improve its ability to perform reasoning tasks.
This model was introduced with the 4.8.1 release.
Usage: General use with zero- or few-shot prompts.
Try it out: Sample prompts
Size: 3 billion parameters
Token limits
- Context window length (input + output): 4096
Supported natural languages: Multilingual
Instruction tuning information: The model was fine-tuned on tasks that involve multiple-step reasoning from chain-of-thought data in addition to traditional natural language processing tasks. Details about the training data sets used are published.
Model architecture: Encoder-decoder
License: Apache 2.0 license
Learn more
flan-t5-xxl-11b
The flan-t5-xxl-11b model is provided by Google on Hugging Face. This model is based on the pretrained text-to-text transfer transformer (T5) model and uses instruction fine-tuning methods to achieve better zero- and few-shot performance. The model is also fine-tuned with chain-of-thought data to improve its ability to perform reasoning tasks.
Usage: General use with zero- or few-shot prompts.
Try it out
- Sample prompts
- Sample notebook: Use watsonx, and Google flan-t5-xxl to analyze car rental customer satisfaction from text
- Use watsonx, and Google flan-t5-xxl to analyze sentiments of legal documents
- Sample notebook: Use watsonx and LangChain to make a series of calls to a language model Review the terms of use before you use samples from GitHub.
Size: 11 billion parameters
Token limits
- Context window length (input + output): 4096
Supported natural languages: English, German, French
Instruction tuning information: The model was fine-tuned on tasks that involve multiple-step reasoning from chain-of-thought data in addition to traditional natural language processing tasks. Details about the training data sets used are published.
Model architecture: Encoder-decoder
License: Apache 2.0 license
Learn more
flan-ul2-20b
The flan-ul2-20b model is provided by Google on Hugging Face. This model was trained by using the Unifying Language Learning Paradigms (UL2). The model is optimized for language generation, language understanding, text classification, question answering, common sense reasoning, long text reasoning, structured-knowledge grounding, and information retrieval, in-context learning, zero-shot prompting, and one-shot prompting.
Usage: General use with zero- or few-shot prompts.
Try it out
-
Sample notebook: Use watsonx, Elasticsearch, and LangChain to answer questions (RAG)
-
Sample notebook: Use watsonx, and Elasticsearch Python SDK to answer questions (RAG) : Sample notebook: Use watsonx and LangChain to make a series of calls to a language model
Review the terms of use before using samples from GitHub.
Size: 20 billion parameters
Token limits
- Context window length (input + output): 4096
Supported natural languages: English
Instruction tuning information: The flan-ul2-20b model is pretrained on the colossal, cleaned version of Common Crawl's web crawl corpus. The model is fine-tuned with multiple pretraining objectives to optimize it for various natural language processing tasks. Details about the training data sets used are published.
Model architecture: Encoder-decoder
License: Apache 2.0 license
Learn more
gpt-neox-20b (Deprecated)
This model was deprecated in the 4.8.4 release. For more information, see Foundation model lifecycle.
The gpt-neox-20b model is provided by EleutherAI on Hugging Face. This model is an autoregressive language model that is trained on diverse English-language texts to support general-purpose use cases. GPT-NeoX-20B has not been fine-tuned for downstream tasks.
Usage: Works best with few-shot prompts. Accepts special characters, which can be used for generating structured output. The data set used for training contains profanity and offensive text. Be sure to curate any output from the model before using it in an application.
Try it out
-
Use watsonx, and eleutherai gpt-neox-20b to summarize legal Contracts documents
Review the terms of use before using samples from GitHub.
Size: 20 billion parameters
Token limits
- Context window length (input + output): 8192
Supported natural languages: English
Data used during training: The gpt-neox-20b model was trained on the Pile. For more information about the Pile, see The Pile: An 800GB Dataset of Diverse Text for Language Modeling. The Pile was not deduplicated before it was used for training.
Model architecture: Decoder
License: Apache 2.0 license
Learn more
granite-13b-chat-v2
The granite-13b-chat-v2 model is provided by IBM. This model is optimized for dialog use cases and works well with virtual agent and chat applications.
A modification to this model was introduced with the 4.8.4 release.
This model was introduced with the 4.8.1 release.
Usage: Generates dialog output like a chatbot. Uses a model-specific prompt format. Includes a keyword in its output that can be used as a stop sequence to produce succinct answers.
Try it out: Sample prompt
Size: 13 billion parameters
Token limits: Context window length (input + output): 8192
Supported natural languages: English
Instruction tuning information: The Granite family of models is trained on enterprise-relevant data sets from five domains: internet, academic, code, legal, and finance. Data used to train the models first undergoes IBM data governance reviews and is filtered of text that is flagged for hate, abuse, or profanity by the IBM-developed HAP filter. IBM shares information about the training methods and data sets used.
Model architecture: Decoder
License
- Terms of use
- For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.
Learn more
granite-13b-chat-v1 (Deprecated)
This model was deprecated in the 4.8.4 release. For more information, see Foundation model lifecycle.
The granite-13b-chat-v1 model is provided by IBM. This model is optimized for dialog use cases and works well with virtual agent and chat applications.
Usage: Generates dialog output like a chatbot. Uses a model-specific prompt format. Includes a keyword in its output that can be used as a stop sequence to produce succinct answers.
Try it out: Sample prompt
Size: 13 billion parameters
Token limits: Context window length (input + output): 8192
Supported natural languages: English
Instruction tuning information: The Granite family of models is trained on enterprise-relevant data sets from five domains: internet, academic, code, legal, and finance. Data used to train the models first undergoes IBM data governance reviews and is filtered of text that is flagged for hate, abuse, or profanity by the IBM-developed HAP filter. IBM shares information about the training methods and data sets used.
Model architecture: Decoder
License: Terms of use
Learn more
granite-13b-instruct-v2
The granite-13b-instruct-v2 model is provided by IBM. This model was trained with high-quality finance data, and is a top-performing model on finance tasks. Financial tasks evaluated include: providing sentiment scores for stock and earnings call transcripts, classifying news headlines, extracting credit risk assessments, summarizing financial long-form text, and answering financial or insurance-related questions.
This model was introduced with the 4.8.1 release.
Usage: Supports extraction, summarization, and classification tasks. Generates useful output for finance-related tasks. Uses a model-specific prompt format. Accepts special characters, which can be used for generating structured output.
Try it out
- Sample 3b: Generate a numbered list on a particular theme
- Sample 4c: Answer a question based on a document
- Sample 4d: Answer general knowledge questions
Size: 13 billion parameters
Token limits: Context window length (input + output): 8192
Supported natural languages: English
Instruction tuning information: The Granite family of models is trained on enterprise-relevant data sets from five domains: internet, academic, code, legal, and finance. Data used to train the models first undergoes IBM data governance reviews and is filtered of text that is flagged for hate, abuse, or profanity by the IBM-developed HAP filter. IBM shares information about the training methods and data sets used.
Model architecture: Decoder
License
- Terms of use
- For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.
Learn more
granite-13b-instruct-v1 (Deprecated)
This model was deprecated in the 4.8.4 release. For more information, see Foundation model lifecycle.
The granite-13b-instruct-v1 model is provided by IBM. This model was trained with high-quality finance data, and is a top-performing model on finance tasks. Financial tasks evaluated include: providing sentiment scores for stock and earnings call transcripts, classifying news headlines, extracting credit risk assessments, summarizing financial long-form text, and answering financial or insurance-related questions.
Usage: Supports extraction, summarization, and classification tasks. Generates useful output for finance-related tasks. Uses a model-specific prompt format. Accepts special characters, which can be used for generating structured output.
Try it out
- Sample 3b: Generate a numbered list on a particular theme
- Sample 4d: Answer general knowledge questions
Size: 13 billion parameters
Token limits: Context window length (input + output): 8192
Supported natural languages: English
Instruction tuning information: The Granite family of models is trained on enterprise-relevant data sets from five domains: internet, academic, code, legal, and finance. Data used to train the models first undergoes IBM data governance reviews and is filtered of text that is flagged for hate, abuse, or profanity by the IBM-developed HAP filter. IBM shares information about the training methods and data sets used.
Model architecture: Decoder
License: Terms of use
Learn more
granite-8b-japanese
The granite-8b-japanese model is provided by IBM. The granite-8b-japanese foundation model is based on the IBM Granite Instruct foundation model and is trained to understand and generate Japanese text.
This model was introduced with the 4.8.4 release.
Usage: Useful for general purpose tasks in the Japanese language, such as classification, extraction, question-answering, and for language translation between Japanese and English.
Try it out
- Sample 4e: Answer a question based on a document
- Sample 7d: Converse in a dialog
- Sample 8c: Translate text
Size: 8 billion parameters
Token limits: Context window length (input + output): 8192
Supported natural languages: English, Japanese
Instruction tuning information: The Granite family of models is trained on enterprise-relevant data sets from five domains: internet, academic, code, legal, and finance. The granite-8b-japanese model was pretrained on 1 trillion tokens of English and 0.5 trillion tokens of Japanese text.
Model architecture: Decoder
License
- Terms of use
- For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.
Learn more
granite-20b-multilingual
A foundation model from the IBM Granite family. The granite-20b-multilingual foundation model is based on the IBM Granite Instruct foundation model and is trained to understand and generate text in English, German, Spanish, French, and Portuguese.
This model was introduced with the 4.8.4 release.
Usage: English, German, Spanish, French, and Portuguese closed-domain question answering, summarization, generation, extraction, and classification.
Size: 13 billion parameters
Token limits: Context window length (input + output): 8192
Supported natural languages: English, German, Spanish, French, and Portuguese
Instruction tuning information: The Granite family of models is trained on enterprise-relevant data sets from five domains: internet, academic, code, legal, and finance. Data used to train the models first undergoes IBM data governance reviews and is filtered of text that is flagged for hate, abuse, or profanity by the IBM-developed HAP filter. IBM shares information about the training methods and data sets used.
Model architecture: Decoder
License: Terms of use
- For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.
Learn more
jais-13b-chat
The jais-13b-chat foundation model is a bilingual large language model for Arabic and English that is fine-tuned to support conversational tasks.
This model was introduced with the 4.8.5 release.
Usage: Supports Q&A, summarization, classification, generation, extraction, and translation in Arabic.
Try it out
Size: 13 billion parameters
Token limits: Context window length (input + output): 2048
Supported natural languages: Arabic (Modern Standard Arabic) and English
Instruction tuning information: Jais-13b-chat is based on the Jais-13b model, which is a foundation model that is trained on 116 billion Arabic tokens and 279 billion English tokens. Jais-13b-chat is fine-tuned with a curated set of 4 million Arabic and 6 million English prompt-and-response pairs.
Model architecture: Decoder
License: Apache 2.0
Learn more
Llama 2 Chat
The Llama 2 Chat model is provided by Meta on Hugging Face. The fine-tuned model is useful for chat generation. The model is pretrained with publicly available online data and fine-tuned using reinforcement learning from human feedback.
You can choose to use the 13 billion parameter or 70 billion parameter version of the model.
Starting with the 4.8.1 release, you can choose to use the 13 billion parameter or 70 billion parameter version of the model.
Usage: Generates dialog output like a chatbot. Uses a model-specific prompt format.
Try it out
- Sample prompt
- Sample notebook: Use watsonx and Meta llama-2-70b-chat to answer questions about an article Review the terms of use before using the sample from GitHub.
Available sizes
- 13 billion parameters
- 70 billion parameters
Token limits
- Context window length (input + output): 4096
Supported natural languages: English
Instruction tuning information: Llama 2 was pretrained on 2 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction data sets and more than one million new examples that were annotated by humans.
Model architecture: Decoder-only
License: License
Learn more
llama2-13b-dpo-v7
The llama2-13b-dpo-v7 foundation model is provided by Minds & Company. The llama2-13b-dpo-v7 foundation model is a version of llama2-13b foundation model from Meta that is instruction-tuned and fine-tuned by using the direct preference optimzation method to handle Korean.
This model was introduced with the 4.8.5 release.
Usage: Suitable for many tasks, including classification, extraction, summarization, code creation and conversion, question-answering, generation, and retreival-augmented generation in Korean.
Try it out
Size: 13.2 billion parameters
Token limits: Context window length (input + output): 4096
Supported natural languages: English, Korean
Instruction tuning information: Direct preference optimzation (DPO) is an alternative to reinforcement learning from human feedback. With reinforcement learning from human feedback, responses must be sampled from a language model and an intermediate step of training a reward model is required. The direct preference optimzation uses a binary method of reinforcement learning where the model chooses the best of two answers based on preference data.
Model architecture: Decoder-only
License: License
Learn more
mixtral-8x7b-instruct-v01-q
The mixtral-8x7b-instruct-v01-q model is provided by IBM. The mixtral-8x7b-instruct-v01-q foundation model is a quantized version of the Mixtral 8x7B Instruct foundation model from Mistral AI.
The underlying Mixtral 8x7B foundation model is a sparse mixture-of-experts network that groups the model parameters, and then for each token chooses a subset of groups (referred to as experts) to process the token. As a result, each token has access to 47 billion parameters, but only uses 13 billion active parameters for inferencing, which reduces costs and latency.
This model was introduced with the 4.8.4 release.
Usage: Suitable for many tasks, including classification, summarization, generation, code creation and conversion, and language translation. Due to the model's unusually large context window, use the max tokens parameter to specify a token limit when prompting the model.
Try it out: Sample prompts
Size: 8 x 7 billion parameters
Token limits
- Context window length (input + output): 32,768
- Note: The maximum new tokens, which means the tokens generated by the foundation model, is limited to 4096.
Supported natural languages: English, French, German, Italian, Spanish
Instruction tuning information: The Mixtral foundation model is pretrained on internet data. The Mixtral 8x7B Instruct foundation model is fine-tuned to follow instructions.
The IBM-tuned model uses the AutoGPTQ (Post-Training Quantization for Generative Pre-Trained Transformers) method to compress the model weight values from 16-bit floating point data types to 4-bit integer data types during data transfer. The weights decompress at computation time. Compressing the weights to transfer data reduces the GPU memory and GPU compute engine size requirements of the model.
Model architecture: Decoder-only
License: Apache 2.0 license
Learn more
mpt-7b-instruct2 (Deprecated)
This model was deprecated in the 4.8.4 release. For more information, see Foundation model lifecycle.
The mpt-7b-instruct2 model is provided by MosaicML and tuned by IBM. This model is a fine-tuned version of the base MosaicML Pretrained Transformer (MPT) model that was trained to handle long inputs. This version of the model was optimized by IBM for following short-form instructions.
Usage: General use with zero- or few-shot prompts.
Try it out: Sample prompts
Size: 7 billion parameters
Token limits
- Context window length (input + output): 2048
Supported natural languages: English
Instruction tuning information: The dataset that was used to train this model is a combination of the Dolly dataset from Databrick and a filtered subset of the Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback training data from Anthropic. During filtering, parts of dialog exchanges that contain instruction-following steps were extracted to be used as samples.
Model architecture: Encoder-decoder
License: Apache 2.0 license
Learn more
mt0-xxl-13b
The mt0-xxl-13b model is provided by BigScience on Hugging Face. The model is optimized to support language generation and translation tasks with English, languages other than English, and multilingual prompts.
Usage: General use with zero- or few-shot prompts. For translation tasks, include a period to indicate the end of the text you want translated or the model might continue the sentence rather than translate it.
Try it out
Size: 13 billion parameters
Supported natural languages: Multilingual
Token limits
- Context window length (input + output): 4096
Supported natural languages: The model is pretrained on multilingual data in 108 languages and fine-tuned with multilingual data in 46 languages to perform multilingual tasks.
Instruction tuning information: BigScience publishes details about its code and data sets.
Model architecture: Encoder-decoder
License: Apache 2.0 license
Learn more
starcoder-15.5b (Deprecated)
This model is deprecated. For more information, see Foundation model lifecycle.
The starcoder-15.5b model is provided by BigCode on Hugging Face. This model can generate code and convert code from one programming language to another. The model is meant to be used by developers to boost their productivity.
Usage: Code generation and code conversion
Try it out
-
Sample notebook: Use watsonx and BigCode starcoder-15.5b to generate code based on instruction
Review the terms of use before using the sample from GitHub.
Size: 15.5 billion parameters
Token limits: Context window length (input + output): 8192
Supported programming languages: Over 80 programming languages, with an emphasis on Python.
Data used during training: This model was trained on over 80 programming languages from GitHub. A filter was applied to exclude from the training data any licensed code or code that is marked with opt-out requests. Nevertheless, the model's output might include code from its training data that requires attribution. The model was not instruction-tuned. Submitting input with only an instruction and no examples might result in poor model output.
Model architecture: Decoder
License: License
Learn more
Any deprecated foundation models are highlighted with a warning icon . For more information about deprecation, including foundation model withdrawal
dates, see Foundation model lifecycle.
Learn more
Parent topic: Developing generative AI solutions