The Spyre Accelerator, introduced with the IBM z17 and IBM LinuxONE 5, is purpose-built to handle large-language models and generative and agentic AI workloads securely within enterprise systems.
It brings enterprise-scale AI inferencing directly to IBM Z, enabling large language models and agents to run entirely on the mainframe. This integration allows organizations to scale generative and agentic AI while preserving privacy, governance and reliability.
With watsonx Assistant for Z now leveraging Spyre Accelerator, helps enterprises gain even greater flexibility to deploy large language models natively on IBM Z. This release introduces support for IBM Granite foundation models—starting with the Granite 3.3-8B-instruct model, tested and optimized to run on IBM Z with Spyre cards for z17 deployments.
At the same time, we continue to support Llama-based deployments for customers running their LLMs on x86 infrastructure, driving choice and adaptability across environments. This allows organizations to choose models that best align with their performance, compliance, and infrastructure strategy, while leveraging Spyre’s accelerated, on-platform execution for generative and agentic AI at scale.
When combined, watsonx Assistant for Z and Spyre enable:
- Secured, on-platform AI inferencing with supported data residency and encryption
- Scalable performance for high-volume, concurrent AI requests
- Trusted automation designed for regulated, mission-critical workloads
Together, these advancements help watsonx Assistant for Z deliver faster, secured results with reduced operational complexity, all within the trusted IBM Z environment.