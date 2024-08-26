In 2021, IBM® introduced the IBM Telum® processor, featuring its first advanced on-processor chip AI accelerator for inferencing. The Telum processor’s ability to deliver business outcomes has been a key driver behind the success of the IBM z16™ mainframe program. As client needs evolve, IBM continues to innovate and push the envelope on emerging technologies.

Today at the Hot Chips 2024 conference in Palo Alto, California, IBM announced the next generation of enterprise computing for the AI era with the IBM Telum® II processor and a preview of the IBM Spyre™ Accelerator. Both are expected to be available in 2025.

Developed using Samsung 5nm technology, the new IBM Telum II processor will feature eight high-performance cores running at 5.5GHz. Telum II will include a 40% increase in on-chip cache capacity, with the virtual L3 and virtual L4 growing to 360MB and 2.88GB respectively. The processor integrates a new data processing unit (DPU) specialized for IO acceleration and the next generation of on-chip AI acceleration. These hardware enhancements are designed to provide significant performance improvements for clients over previous generations.

Infusing AI into enterprise transactions has become essential for many of our clients’ workloads. For instance, our AI-driven fraud detection solutions are designed to save clients millions of dollars annually. With the introduction of the AI accelerator on the Telum processor, we’ve seen active adoption across our client base. Building on this success, we’ve significantly enhanced the AI accelerator on the Telum II processor.

The compute power of each accelerator is expected to be improved by 4x, reaching 24 trillion operations per second (TOPS). But TOPS alone don’t tell the whole story. It is all about the accelerator’s architectural design plus optimization of the AI ecosystem that sits on top of the accelerator. When it comes to AI acceleration in production enterprise workloads, a fit-for-purpose architecture matters. Telum II is engineered to enable model runtimes to sit side by side with the most demanding enterprise workloads, while delivering high throughput, low-latency inferencing. Additionally, support for INT8 as a data type has been added to enhance compute capacity and efficiency for applications where INT8 is preferred, thereby enabling the use of newer models.

New compute primitives have also been incorporated to better support large language models within the accelerator. They are designed to support an increasingly broader range of AI models for a comprehensive analysis of both structured and textual data.