IBM granite-13b-instruct-v2 model card

Granite Base 13 Billion Model (granite.13b.v2.instruct) Details

IBM Generative AI Large Language Foundation Models are Enterprise-level English-language models trained with large a volume of data that has been subjected to intensive pre-processing and careful analysis. The Granite 13 Billion V2.0 Instruct (granite.13b.v2.instruct) model is the instruction-tuned variant initialized from the pre-trained granite.13b.v2 model.

Model Checkpoint Name Pre-training Data Seen MMLU (5-shot)
granite.13b.v2.instruct granite.13b.2500b.chat (17 November 2023) 2.5T Tokens 45.3
  • Person or organization developing the model:

    • Granite (13B) Instruct was developed by IBM Research for its watsonx.ai initiative.
  • Model release date and version:

    • Granite (13B) Instruct version 2.0 was released on 30 November 2023.
  • Model type:

  • Information about training algorithms, parameters, fairness constraints or other applied approaches, and features:

    • Model was trained using 4x Tensor Parallel + 4x Pipeline Parallel + Megatron distributed optimizer Megatron-LM.
    • Cluster: CCC
    • GPUs: 256x A100 80GB
    • Interconnect: 200 gigabit Infiniband
    • Dataset streamed over GPFS
  • Paper or other resource for more information:

  • License:

    • Available only through IBM products and offerings.

Version Release notes:

Granite.13b.v2.instruct was tuned in a very similar manner to granite.13b.v1.instruct, and therefore prompts that previously performed well with v1 are expected to perform reasonably well with v2 (minor prompt optimizations may still be needed). Aside from benefiting from the updated base model (granite.13b.v2), the only other major change implemented for granite.13b.v2 was specific alignment steps to improve the model's robustness to whitespaces.

Intended Use [PENDING FINAL EVALUATION]

  • Primary intended uses:

    • English based classification, extraction, and summarization.
  • Primary intended users:

    • The primary users are IBM Enterprise clients looking to bolster their portfolios with Enterprise-level generative AI models.
  • Out-of-scope use cases:

    • The granite.13b models are not designed, tested, or supported, for code use cases of any kind.

Factors

  • Relevant factors: Models work with proper english text. All datasets have been cleansed of any type of tagging (e.g., HTML), and all media has been removed as well.
  • Evaluation factors: Evaluation datasets have to be proper english, and are limited to text only.

Metrics

IBM has built a comprehensive test framework FM eval that is used through out the model's life-cycle. This can be used both to evaluate IBM's own models and those already trained by third-parties allowing models to be measured against a variety of benchmarks. The evaluation framework runs on an Openshift cluster with GPU support and uses various AI model evaluation frameworks: Eleuther AI's Language Model Evaluation Harness lm-eval, Stanford's Holistic Evaluation of Language Models (HELM), Beyond the Imitation Game Benchmark (BIG-bench), as well as IBM-internal datasets.

Data, Limitations, and Recommendations

  • Data selection for training:
    • The Granite (13B) V2.0 model underwent extended training using tokens from the IBM Data Pile Version 0.4 on top of the original tokens from the IBM Data Pile Version 0.3. A breakdown of the sampling data used for pre-training the base model can be found on the granite.13b.v2.

Data Composition and Sampling

  • On top of the 2.5 Trillion tokens from the base model (granite.13b.v2), this model underwent instruction tuning.
  • Tokenizer used:
    • GPT-NeoX 20B
    • 2.0 Trillion Tokens
    • More...

Dataset sampling for Granite (13B) V2.0 Instruct

The granite.13b.v2.instruct model variant was fine-tuned using Supervised Fine-Tuned (SFT). It was first initialized from the granite.13b.v2 base model and trained on mix700k, a custom mixture of internal and external datasets with a size of 700,000 instructions. The dataset was generated with datasets==2.13, it's seed set to 901, and it includes 100,000 prompts (instruct-v3) in Tulu's encoding_templates_w_input format.

The datasets were used in the SFT mixture.

  • Dolly HHRLHF
  • gma_emnlp203-PromptEd (IBM created synthetic dataset)
  • ConvAI (IBM curated data mixture)
    • Creme Brule Wave 1
    • IBM data for Creme Brule (portion 1)
    • synth_asaf_askhr Dataset (IBM created synthetic dataset)
  • Creme Brule Wave 1
  • instructv3

Quantitative Analyses

Ethical Considerations

The IBM AI Ethics Board is a central, cross-disciplinary body that defines the AI Ethics vision and strategy with the objective of supporting a culture of ethical, responsible, and trustworthy AI throughout the IBM organization. The Board's mission is to support a centralized governance, review, and decision-making process for IBM ethics policies, practices, communications, research, products and services. The AI Ethics Board has sponsored the Ethics by Design framework (EbD) and methodology which integrates tech ethics tech ethics solutions in the technology development pipeline, including AI systems. EbD builds upon IBM's existing security and privacy practices including Security and Privacy by Design. IBM Products are expected to follow this methodology; EbD is embedded in IBM's governance structure through IBM's Tech Ethics by Design corporate directive and enabling processes. The EbD framework considers five pillars of trustworthiness: fairness, transparency, explainability, robustness, and privacy. EbD addresses ethical design at all phases of the AI life cycle including data collection, model building and training, model validation, and model deployment.

Appendix A - Model Configuration

 {
   "activation_function": "gelu",
   "architectures": [
     "GPTBigCodeForCausalLM"
   ],
   "attention_softmax_in_fp32": true,
   "attn_pdrop": 0.1,
   "bos_token_id": 0,
   "embd_pdrop": 0.1,
   "eos_token_id": 0,
   "initializer_range": 0.02,
   "layer_norm_epsilon": 1e-05,
   "model_type": "gpt_bigcode",
   "multi_query": true,
   "n_embd": 5632,
   "n_head": 44,
   "n_inner": 22528,
   "n_layer": 40,
   "n_positions": 8192,
   "pad_token_id": 0,
   "resid_pdrop": 0.1,
   "scale_attention_softmax_in_fp32": true,
   "scale_attn_weights": true,
   "summary_activation": null,
   "summary_first_dropout": 0.1,
   "summary_proj_to_labels": true,
   "summary_type": "cls_index",
   "summary_use_proj": true,
   "torch_dtype": "float32",
   "transformers_version": "4.28.1",
   "use_cache": true,
   "vocab_size": 50304
 }

Model-specific instruction format

System prompt

No special prompt is required for this model

Model specific format

No special format required for this model, this model benefited from flan-templated data in its alignment step and therefore should perform well with flan-style prompt templates.

Model-specific configurations

At this time, there are no known specific parameter settings or stopping sequences are needed for this model.

Languages

English-Only

Tasks

Intended uses of this model cover Classification and Extraction use-cases. This model can also potentially support Summarization use cases, particularly when shorter, concise summaries are desired.