IBM granite-8b-japanese model card
Model Version (1.0.0): Released 2/29/2024
The Granite 8 Billion Japanese (granite-8b-japanese) model is an instruct variant initialized from the pre-trained Granite Base 8 Billion Japanese model. Pre-training went through 1.0T tokens of English, 0.5T tokens of Japanese, and
0.1T tokens of code. This model is designed to work with Japanese text. IBM Generative AI Large Language Foundation Models are Enterprise-level Multilingual models trained with large volumes of data that has been subjected to intensive pre-processing
and careful analysis.
- Person or organization developing the model:
granite-8b-japanesewas developed by IBM Research.
- Model release date and version:
granite-8b-japaneseversion 1.0 was released on 2/29/2024.
- Model type:
granite-8b-japaneseis a decoder-only transformer model.- The following features were used in the design of the model:
- Decoder-only model
- Group-Query Attention
- IBM Japanese/English Trained Tokenizer
- 4096 context length
- Rotary Position Embedding(RoPE)
- SwiGLU Activations
- Root Mean Square Layer Normalization
- Information about training algorithms, parameters, fairness constraints or other applied approaches, and features:
- Model was trained using 4x Tensor Parallel + 4x Pipeline Parallel + Megatron distributed optimizer Megatron-LM.
- GPUs: 448x A100 80GB
- Interconnect: 1600 gigabit Infiniband
- License:
- Available only through IBM products and offerings. Contact IBM for licensing terms.
Intended Use
- Primary intended uses:
granite-8b-japaneseis used for text generation, summarization, question and answer, classification, and extraction in Japanese.
- Primary intended users:
- The primary users are IBM Enterprise clients looking to bolster their portfolios with Enterprise-level generative AI models.
- Out-of-scope use cases:
granite-8b-japaneseis not designed, tested, or supported, for code use cases of any kind.
Factors
- Relevant factors:
granite-8b-japaneseworks with Japanese text. All datasets have been cleansed of any type of tagging (e.g., HTML), and all media has been removed as well.
Metrics
granite-8b-japanese was evaluated using the following eight well known datasets from Stability-AI/lm-evaluation-harness:
-
JCommonsenseQA is a Japanese version of CommonsenseQA (Talmor+, 2019), which is a multiple-choice question answering dataset that requires commonsense reasoning ability. It is built using crowdsourcing with seeds extracted from the knowledge base ConceptNet.
-
JNLI is a Japanese version of the NLI (Natural Language Inference) dataset. NLI is a task to recognize the inference relation that a premise sentence has to a hypothesis sentence. The inference relations are
含意,矛盾, and中立. -
MARC-ja is a dataset of the text classification task. This dataset is based on the Japanese portion of Multilingual Amazon Reviews Corpus (MARC) (Keung+, 2020).
-
JSQuAD is a Japanese version of SQuAD (Rajpurkar+, 2016), one of the datasets of reading comprehension. Each instance in the dataset consists of a question regarding a given context (Wikipedia article) and its answer. JSQuAD is based on SQuAD 1.1 (there are no unanswerable questions). We used the Japanese Wikipedia dump as of 20211101.
-
Japanese Questions on Knowledge of Entity (JAQKET) is a Japanese open-domain question answering dataset where the answers are Wikipedia article titles.
-
XLSum-ja This is a filtered Japanese subset of XLSum based on ROUGE-2, which PaLM 2 uses. It is composed of filtered data based on 15-gram overlap as PaLM 2 did.
-
XWinograd - XWinograd is a set of Winograd Schema sentence pairs. For example:
- ボブはトムに尋ねた。トムはお金をいくらか貸してくれるかと。
- ボブはトムに尋ねた。ボブはお金をいくらか貸してくれるかと。
In this case the first sentence is correct, because it doesn't make sense for Bob to ask Tom how much money Bob himself will loan. The task is for the model to assign the higher log likelihood to the reasonable sentence. Because of the way the task is defined, it's always zero-shot with no prompt. While XWinograd is a multilingual task, this only uses the Japanese subset, which has 959 pairs.
-
Multilingual Grade School Math is a set of 250 math word problems in Japanese, and the task is to get the right integer solution to the problem.
Zero-shot results
| Task | Version | Metric | Performance |
|---|---|---|---|
| jcommonsenseqa-1.1-0.3 | 1.1 | acc | 0.7078 |
| jnli-1.3-0.3 | 1.3 | balanced_acc | 0.5032 |
| marc_ja-1.1-0.3 | 1.1 | balanced_acc | 0.6442 |
| jsquad-1.1-0.3 | 1.1 | f1 | 59.3862 |
| jaqket_v2-0.2-0.3 | 0.2 | f1 | 60.3066 |
| xlsum_ja-1.0-0.3 | 1 | rouge2 | 7.2561 |
| xwinograd_ja | 1 | acc | 0.683 |
| mgsm-1.0-0.3 | 1 | acc | 0.028 |
N-shot results
| Task | Version | Metric | Performance |
|---|---|---|---|
| jcommonsenseqa-1.1-0.3 | 1.1 | acc | 0.807 |
| jnli-1.3-0.3 | 1.3 | balanced_acc | 0.5935 |
| marc_ja-1.1-0.3 | 1.1 | balanced_acc | 0.9461 |
| jsquad-1.1-0.3 | 1.1 | f1 | 80.9671 |
| jaqket_v2-0.2-0.3 | 0.2 | f1 | 74.9605 |
| xlsum_ja-1.0-0.3 | 1 | rouge2 | 9.4874 |
| xwinograd_ja | 1 | acc | 0.683 |
| mgsm-1.0-0.3 | 1 | acc | 0.116 |
Data, Limitations, and Recommendations
- Data selection for training:
- The
granite-8b-japaneseunderwent pre-training using 1.0T tokens of English, 0.5T tokens of Japanese, and 0.1T tokens of code.
- The