IBM granite-code-z-xplain model card

Granite-3.1 8-Billion Parameter Size Code Z Explanation model (granite-code-z-xplain) details

IBM Generative AI Large Language Foundation Models are Enterprise-level English-language models trained with a large volume of data that has been subjected to intensive pre-processing and careful analysis. The Granite-3.1 8-Billion Parameter size Code Z Explanation (granite-code-z-xplain) model has been extended pre-trained (EPT) on top of the granite-3.1-8b-base model checkpoint. The resulting model was further merged with granite-3.1-8b-instruct model using a parameter merging technique.

The granite-3.1-8b-base model comes with 128K context length. For the purpose of producing the granite-code-z-xplain model, this base model was first extended pre-trained in a progressive manner where EPT training sequence length was increased from 8K to 128K in 3 steps and nearly 6B tokens were fed at each stage of the progression. Next, this resulting model was merged with granite-3.1-8b-instruct model using the model merging technique.

  • Person or organization developing the model:

    • granite-code-z-xplain model was developed by IBM Research for IBM watsonx Code Assistant for Z.

  • Model release date and version:

    • granite-code-z-xplain model was released on March 10, 2025.

  • Model type:

  • Paper or other resource for more information:

  • License:

    • Available only through IBM watsonx Code Assistant for Z.

Intended Use

  • Primary intended uses:

    • The primary intended use of granite-code-z-xplain is for IBM watsonx Code Assistant for Z to aid the explanation of code written in COBOL, JCL, and PL/I.

  • Primary intended users:

    • The primary user is IBM watsonx Code Assistant for Z which will use the granite-code-z-xplain.

  • Out-of-scope use cases:

    • Any use-case other than code explanation for IBM watsonx Code Assistant for Z.

    • Explanation of any code written in languages other than COBOL, JCL, and PL/I.

Factors

  • Relevant factors: Models work with proper English text. All datasets have been cleansed of any type of tagging (e.g., HTML), and all media has been removed as well.

  • Evaluation factors: Evaluation datasets have to be properly formatted.

Data, Limitations, and Recommendations

Data Composition and Sampling

granite-code-z-xplain model was extended pre-trained on a mix of datasets comprising of:

  • Knowledge data (books, manuals, sample codes) as well as task-specific paired examples for COBOL language.

  • Knowledge data (books, manuals, sample codes) as well as task-specific paired examples for JCL language.

  • Knowledge data (books, manuals, sample codes) for PL/I language.

  • Knowledge data (books, manuals) and task-specific paired examples for REXX language.

  • Instruction tuning data.

Total data token volume fed during each phase of progressive training around 6B.

Ethical Considerations

The IBM AI Ethics Board is a central, cross-disciplinary body that defines the AI Ethics vision and strategy with the objective of supporting a culture of ethical, responsible, and trustworthy AI throughout the IBM organization. The Board's mission is to support a centralized governance, review, and decision-making process for IBM ethics policies, practices, communications, research, products and services. The AI Ethics Board has sponsored the Ethics by Design framework (EbD) and methodology which integrates tech ethics solutions in the technology development pipeline, including AI systems. EbD builds upon IBM's existing security and privacy practices including Security and Privacy by Design. IBM Products are expected to follow this methodology; EbD is embedded in IBM's governance structure through IBM's Tech Ethics by Design corporate directive and enabling processes. The EbD framework considers five pillars of trustworthiness: fairness, transparency, explainability, robustness, and privacy. EbD addresses ethical design at all phases of the AI life cycle including data collection, model building and training, model validation, and model deployment.

Appendix A

Model Configuration

{
  "architectures": [
    "GraniteForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.1,
  "attention_multiplier": 0.0078125,
  "bos_token_id": 0,
  "embedding_multiplier": 12.0,
  "eos_token_id": 0,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 12800,
  "logits_scaling": 16.0,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "model_type": "granite",
  "num_attention_heads": 32,
  "num_hidden_layers": 40,
  "num_key_value_heads": 8,
  "pad_token_id": 0,
  "residual_multiplier": 0.22,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 10000000.0,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.47.0",
  "use_cache": true,
  "vocab_size": 49155
}