IBM granite-8b-japanese model card

Model Version (1.0.0): Released 2/29/2024

The Granite 8 Billion Japanese (granite-8b-japanese) model is an instruct variant initialized from the pre-trained Granite Base 8 Billion Japanese model. Pre-training went through 1.0T tokens of English, 0.5T tokens of Japanese, and 0.1T tokens of code. This model is designed to work with Japanese text. IBM Generative AI Large Language Foundation Models are Enterprise-level Multilingual models trained with large volumes of data that has been subjected to intensive pre-processing and careful analysis.

  • Person or organization developing the model:
    • granite-8b-japanese was developed by IBM Research.
  • Model release date and version:
    • granite-8b-japanese version 1.0 was released on 2/29/2024.
  • Model type:
  • Information about training algorithms, parameters, fairness constraints or other applied approaches, and features:
    • Model was trained using 4x Tensor Parallel + 4x Pipeline Parallel + Megatron distributed optimizer Megatron-LM.
    • GPUs: 448x A100 80GB
    • Interconnect: 1600 gigabit Infiniband
  • License:
    • Available only through IBM products and offerings. Contact IBM for licensing terms.

Intended Use

  • Primary intended uses:
    • granite-8b-japanese is used for text generation, summarization, question and answer, classification, and extraction in Japanese.
  • Primary intended users:
    • The primary users are IBM Enterprise clients looking to bolster their portfolios with Enterprise-level generative AI models.
  • Out-of-scope use cases:
    • granite-8b-japanese is not designed, tested, or supported, for code use cases of any kind.

Factors

  • Relevant factors: granite-8b-japanese works with Japanese text. All datasets have been cleansed of any type of tagging (e.g., HTML), and all media has been removed as well.

Metrics

granite-8b-japanese was evaluated using the following eight well known datasets from Stability-AI/lm-evaluation-harness:

  • JCommonsenseQA is a Japanese version of CommonsenseQA (Talmor+, 2019), which is a multiple-choice question answering dataset that requires commonsense reasoning ability. It is built using crowdsourcing with seeds extracted from the knowledge base ConceptNet.

  • JNLI is a Japanese version of the NLI (Natural Language Inference) dataset. NLI is a task to recognize the inference relation that a premise sentence has to a hypothesis sentence. The inference relations are 含意, 矛盾, and 中立.

  • MARC-ja is a dataset of the text classification task. This dataset is based on the Japanese portion of Multilingual Amazon Reviews Corpus (MARC) (Keung+, 2020).

  • JSQuAD is a Japanese version of SQuAD (Rajpurkar+, 2016), one of the datasets of reading comprehension. Each instance in the dataset consists of a question regarding a given context (Wikipedia article) and its answer. JSQuAD is based on SQuAD 1.1 (there are no unanswerable questions). We used the Japanese Wikipedia dump as of 20211101.

  • Japanese Questions on Knowledge of Entity (JAQKET) is a Japanese open-domain question answering dataset where the answers are Wikipedia article titles.

  • XLSum-ja This is a filtered Japanese subset of XLSum based on ROUGE-2, which PaLM 2 uses. It is composed of filtered data based on 15-gram overlap as PaLM 2 did.

  • XWinograd - XWinograd is a set of Winograd Schema sentence pairs. For example:

    • ボブはトムに尋ねた。トムはお金をいくらか貸してくれるかと。
    • ボブはトムに尋ねた。ボブはお金をいくらか貸してくれるかと。

    In this case the first sentence is correct, because it doesn't make sense for Bob to ask Tom how much money Bob himself will loan. The task is for the model to assign the higher log likelihood to the reasonable sentence. Because of the way the task is defined, it's always zero-shot with no prompt. While XWinograd is a multilingual task, this only uses the Japanese subset, which has 959 pairs.

  • Multilingual Grade School Math is a set of 250 math word problems in Japanese, and the task is to get the right integer solution to the problem.

Zero-shot results

Task Version Metric Performance
jcommonsenseqa-1.1-0.3 1.1 acc 0.7078
jnli-1.3-0.3 1.3 balanced_acc 0.5032
marc_ja-1.1-0.3 1.1 balanced_acc 0.6442
jsquad-1.1-0.3 1.1 f1 59.3862
jaqket_v2-0.2-0.3 0.2 f1 60.3066
xlsum_ja-1.0-0.3 1 rouge2 7.2561
xwinograd_ja 1 acc 0.683
mgsm-1.0-0.3 1 acc 0.028

N-shot results

Task Version Metric Performance
jcommonsenseqa-1.1-0.3 1.1 acc 0.807
jnli-1.3-0.3 1.3 balanced_acc 0.5935
marc_ja-1.1-0.3 1.1 balanced_acc 0.9461
jsquad-1.1-0.3 1.1 f1 80.9671
jaqket_v2-0.2-0.3 0.2 f1 74.9605
xlsum_ja-1.0-0.3 1 rouge2 9.4874
xwinograd_ja 1 acc 0.683
mgsm-1.0-0.3 1 acc 0.116

Data, Limitations, and Recommendations

  • Data selection for training:
    • The granite-8b-japanese underwent pre-training using 1.0T tokens of English, 0.5T tokens of Japanese, and 0.1T tokens of code.