Overview
In this guide, we’ll use Ollama, an open-source tool that makes it easy to download and run AI models locally.1. Install Ollama
The easiest way is to install the desktop app. You can also install it via Homebrew:2. Start Ollama
Once installed, start the Ollama service:3. Download Granite Models
Ollama supports a range of IBM Granite models. Larger models give better results but require more resources. To download Granite 4:4. Run Granite
To start chatting with Granite:granite4).
5. Notes on Context Length
By default, Ollama runs models with a short context length to save memory.
For longer conversations, you can adjust it by setting:The largest supported context for Granite 4.0 models is
128K.6. Using the API
You can also interact with Granite programmatically using Ollama’s OpenAI-compatible API:7. Adding documents
You can add documents to your context using the special"document <title>" role: