Artificial intelligence (AI) is transforming industries, and businesses require infrastructure that can handle AI workloads both efficiently and securely.
IBM LinuxONE, powered by the IBM Telum® processor, integrates AI acceleration directly into the chip, enabling real-time inferencing of multiple AI models with minimal latency. This advanced capability—combined with predictive AI and large language models—allows businesses to analyze data where it resides, delivering faster and deeper insights for mission-critical applications such as advanced fraud detection, risk analysis, and medical imaging.
The IBM Spyre™ Accelerator card is a 75W PCIe Gen 5x AI accelerator with 128 GB LPDDR5 memory, optimized for generative AI and multimodal LLMs.8 Featuring 32 (+2) cores with a 2 MB scratchpad per core and >55% core utilization, Spyre scales by card and drawer, enabling businesses to handle complex AI inferencing efficiently across enterprise applications.
By adding the IBM Spyre Accelerator cards to IBM LinuxONE 5, additional use cases are enabled, including generative AI.
IBM is working with the IBM LinuxONE Ecosystem to help ISVs provide solutions for today’s AI, sustainability and cybersecurity challenges.
Explore two innovative solutions that are tailored for financial and healthcare institutions: Clari5 Enterprise Fraud Management on IBM LinuxONE 4 Express for real-time fraud prevention and Exponential AI’s Enso Decision Intelligence Platform on LinuxONE for advanced AI solutions at scale.
1 DISCLAIMER: Disclaimer: Performance results are based on IBM® internal tests running on IBM Systems Hardware of machine type 9175. The OLTP application and PostgreSQL was deployed on the IBM Systems Hardware. The Credit Card Fraud Detection (CCFD) ensemble AI setup consists of two models (LSTM, TabFormer). On IBM Systems Hardware, running the OLTP application with IBM Z Deep Learning Compiler (zDLC) compiled jar and IBM Z Accelerated for NVIDIA® Triton™ Inference Server locally and processing the AI inference operations on cores and the Integrated Accelerator for AI versus running the OLTP application locally and processing remote AI inference operations on a x86 server running NVIDIA Triton Inference Server with OpenVINO™ runtime backend on CPU (with AMX). Each scenario was driven from Apache JMeter™ 5.6.3 with 64 parallel users. IBM Systems Hardware configuration: 1 LPAR running Ubuntu 24.04 with 7 dedicated cores (SMT), 256 GB memory, and IBM FlashSystem® 9500 storage. The Network adapters were dedicated for NETH on Linux. x86 server configuration: 1 x86 server running Ubuntu 24.04 with 28 Emerald Rapids Intel® Xeon® Gold CPUs @ 2.20 GHz with Hyper-Threading turned on, 1 TB memory, local SSDs, UEFI with maximum performance profile enabled, CPU P-State Control and C-States disabled. Results may vary.
2, 3 DISCLAIMER: Performance result is extrapolated from IBM® internal tests running on IBM Systems Hardware of machine type 9175. The benchmark was executed with 1 thread performing local inference operations using a LSTM based synthetic Credit Card Fraud Detection model to exploit the Integrated Accelerator for AI. A batch size of 160 was used. IBM Systems Hardware configuration: 1 LPAR running Red Hat® Enterprise Linux® 9.4 with 6 cores (SMT), 128 GB memory. Results may vary.