IBM Granite

Granite Speech

Table of contents

  1. Overview
  2. Model card
  3. Examples
    1. Granite Speech with transformers

Overview

The Granite speech collection offers a robust speech model built by LoRA fine-tuning granite-3.3-8b-instruct. Our granite-speech-3.3-8b model is built for purpose and excels at enterprise tasks focused on automatic speech recognition including English speech-to-text and speech translations from English to identified European languages such as French, Spanish, Italian, German, Portuguese as well as Japanese and Mandarin. Trained on a diverse set of publicly available data and synthetically generated datasets tailored to support the speech translation task, Granite Speech will be fully open sourced and made available under Apache 2.0 license on HuggingFace. For tasks that exclusively involve text-based input, we suggest using our Granite large language models, which are optimized for text-only processing.

Model card

Examples

Granite Speech with transformers

This is a simple example of how to use granite-speech-3.3-8b model with transformers. First, make sure to build the latest version of transformers from source:

pip install https://github.com/huggingface/transformers/archive/main.zip torchaudio peft soundfile

Then run the code:

import torch
import torchaudio
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
from huggingface_hub import hf_hub_download
device = "cuda" if torch.cuda.is_available() else "cpu"
model_name = "ibm-granite/granite-speech-3.3-8b"
speech_granite_processor = AutoProcessor.from_pretrained(