Granite Speech
Granite speech is a compact and efficient speech-language model, built on top of IBMs Granite language model and specifically designed for English automatic speech recognition (ASR).
Table of contents
Overview
The Granite speech collection offers a robust speech model built by LoRA fine-tuning granite-3.3-8b-instruct. Our granite-speech-3.3-8b model is built for purpose and excels at enterprise tasks focused on automatic speech recognition including English speech-to-text and speech translations from English to identified European languages such as French, Spanish, Italian, German, Portuguese as well as Japanese and Mandarin. Trained on a diverse set of publicly available data and synthetically generated datasets tailored to support the speech translation task, Granite Speech will be fully open sourced and made available under Apache 2.0 license on HuggingFace. For tasks that exclusively involve text-based input, we suggest using our Granite large language models, which are optimized for text-only processing.
Model card
Examples
Granite Speech with transformers
This is a simple example of how to use granite-speech-3.3-8b model with transformers. First, make sure to build the latest version of transformers from source:
pip install https://github.com/huggingface/transformers/archive/main.zip torchaudio peft soundfile
Then run the code:
import torchimport torchaudiofrom transformers import AutoProcessor, AutoModelForSpeechSeq2Seqfrom huggingface_hub import hf_hub_downloaddevice = "cuda" if torch.cuda.is_available() else "cpu"model_name = "ibm-granite/granite-speech-3.3-8b"speech_granite_processor = AutoProcessor.from_pretrained(