IBM granite-8b-japanese model card
Model Version (1.0.0): Released 2/29/2024
The Granite 8 Billion Japanese (granite-8b-japanese
) model is an instruct variant initialized from the pre-trained Granite Base 8 Billion Japanese model. Pre-training went through 1.0T tokens of English, 0.5T tokens of Japanese, and
0.1T tokens of code. This model is designed to work with Japanese text. IBM Generative AI Large Language Foundation Models are Enterprise-level Multilingual models trained with large volumes of data that has been subjected to intensive pre-processing
and careful analysis.
- Person or organization developing the model:
granite-8b-japanese
was developed by IBM Research.
- Model release date and version:
granite-8b-japanese
version 1.0 was released on 2/29/2024.
- Model type:
granite-8b-japanese
is a decoder-only transformer model.- The following features were used in the design of the model:
- Decoder-only model
- Group-Query Attention
- IBM Japanese/English Trained Tokenizer
- 4096 context length
- Rotary Position Embedding(RoPE)
- SwiGLU Activations
- Root Mean Square Layer Normalization
- Information about training algorithms, parameters, fairness constraints or other applied approaches, and features:
- Model was trained using 4x Tensor Parallel + 4x Pipeline Parallel + Megatron distributed optimizer Megatron-LM.
- GPUs: 448x A100 80GB
- Interconnect: 1600 gigabit Infiniband
- License:
- Available only through IBM products and offerings. Contact IBM for licensing terms.
Intended Use
- Primary intended uses:
granite-8b-japanese
is used for text generation, summarization, question and answer, classification, and extraction in Japanese.
- Primary intended users:
- The primary users are IBM Enterprise clients looking to bolster their portfolios with Enterprise-level generative AI models.
- Out-of-scope use cases:
granite-8b-japanese
is not designed, tested, or supported, for code use cases of any kind.
Factors
- Relevant factors:
granite-8b-japanese
works with Japanese text. All datasets have been cleansed of any type of tagging (e.g., HTML), and all media has been removed as well.
Metrics
granite-8b-japanese
was evaluated using the following eight well known datasets from Stability-AI/lm-evaluation-harness:
-
JCommonsenseQA is a Japanese version of CommonsenseQA (Talmor+, 2019), which is a multiple-choice question answering dataset that requires commonsense reasoning ability. It is built using crowdsourcing with seeds extracted from the knowledge base ConceptNet.
-
JNLI is a Japanese version of the NLI (Natural Language Inference) dataset. NLI is a task to recognize the inference relation that a premise sentence has to a hypothesis sentence. The inference relations are
含意
,矛盾
, and中立
. -
MARC-ja is a dataset of the text classification task. This dataset is based on the Japanese portion of Multilingual Amazon Reviews Corpus (MARC) (Keung+, 2020).
-
JSQuAD is a Japanese version of SQuAD (Rajpurkar+, 2016), one of the datasets of reading comprehension. Each instance in the dataset consists of a question regarding a given context (Wikipedia article) and its answer. JSQuAD is based on SQuAD 1.1 (there are no unanswerable questions). We used the Japanese Wikipedia dump as of 20211101.
-
Japanese Questions on Knowledge of Entity (JAQKET) is a Japanese open-domain question answering dataset where the answers are Wikipedia article titles.
-
XLSum-ja This is a filtered Japanese subset of XLSum based on ROUGE-2, which PaLM 2 uses. It is composed of filtered data based on 15-gram overlap as PaLM 2 did.
-
XWinograd - XWinograd is a set of Winograd Schema sentence pairs. For example:
- ボブはトムに尋ねた。トムはお金をいくらか貸してくれるかと。
- ボブはトムに尋ねた。ボブはお金をいくらか貸してくれるかと。
In this case the first sentence is correct, because it doesn't make sense for Bob to ask Tom how much money Bob himself will loan. The task is for the model to assign the higher log likelihood to the reasonable sentence. Because of the way the task is defined, it's always zero-shot with no prompt. While XWinograd is a multilingual task, this only uses the Japanese subset, which has 959 pairs.
-
Multilingual Grade School Math is a set of 250 math word problems in Japanese, and the task is to get the right integer solution to the problem.
Zero-shot results
Task | Version | Metric | Performance |
---|---|---|---|
jcommonsenseqa-1.1-0.3 | 1.1 | acc | 0.7078 |
jnli-1.3-0.3 | 1.3 | balanced_acc | 0.5032 |
marc_ja-1.1-0.3 | 1.1 | balanced_acc | 0.6442 |
jsquad-1.1-0.3 | 1.1 | f1 | 59.3862 |
jaqket_v2-0.2-0.3 | 0.2 | f1 | 60.3066 |
xlsum_ja-1.0-0.3 | 1 | rouge2 | 7.2561 |
xwinograd_ja | 1 | acc | 0.683 |
mgsm-1.0-0.3 | 1 | acc | 0.028 |
N-shot results
Task | Version | Metric | Performance |
---|---|---|---|
jcommonsenseqa-1.1-0.3 | 1.1 | acc | 0.807 |
jnli-1.3-0.3 | 1.3 | balanced_acc | 0.5935 |
marc_ja-1.1-0.3 | 1.1 | balanced_acc | 0.9461 |
jsquad-1.1-0.3 | 1.1 | f1 | 80.9671 |
jaqket_v2-0.2-0.3 | 0.2 | f1 | 74.9605 |
xlsum_ja-1.0-0.3 | 1 | rouge2 | 9.4874 |
xwinograd_ja | 1 | acc | 0.683 |
mgsm-1.0-0.3 | 1 | acc | 0.116 |
Data, Limitations, and Recommendations
- Data selection for training:
- The
granite-8b-japanese
underwent pre-training using 1.0T tokens of English, 0.5T tokens of Japanese, and 0.1T tokens of code.
- The