Accelerating TensorFlow Inference on IBM z16

How to leverage the IBM-zDNN-Plugin for TensorFlow.

AI brings incredibly transformative capabilities that enterprise clients are interested in leveraging. The ability to get new insights out of their data and applications represents a massive opportunity.

However, artificial intelligence (AI) is also a very complex and continuously developing space. With the exciting opportunities comes the need to invest resources to develop skills on the latest technologies and techniques that are in use in the industry. At its core, AI software is driven by a rich and diverse open-source ecosystem that supports multiple phases of the model lifecycle. This includes the ability to provide highly optimized training and inference capabilities that can accelerate time to value.

As we’ve worked with enterprise clients, it’s become clear that they recognize and embrace the use of open source in their AI projects and have developed advanced skills in popular frameworks like TensorFlow. To enable our clients to leverage these skills in IBM Z and IBM LinuxONE environments, IBM has focused on ensuring the most exciting and popular open-source AI is available on our systems with the same look and feel as other commonly used environments.

IBM is also focusing on ensuring models are seamlessly optimized for IBM Z and LinuxONE when deployed for production use. Through technologies like the Open Neural Network Exchange and the IBM Z Deep Learning Compiler, we provide simple portability and optimized inference that can leverage our newest capabilities, including the IBM z16 and LinuxONE on-chip AI accelerator (the IBM Integrated Accelerator for AI).

Recently, we announced the general availability of new capabilities that enable TensorFlow to directly leverage the on-chip AI inference accelerator featured in IBM z16 and LinuxONE Emperor 4.

What is the IBM-zDNN-Plugin for TensorFlow?

TensorFlow is one of the most popular AI Frameworks in existence, with over 171K Github stars, 150K+ active contributors and over 87K Github forks. It is an open-source framework that supports the entire machine-learning lifecycle—from model development through deployment. TensorFlow also has a robust extended ecosystem that can help augment your AI projects.

A few weeks back, we introduced the ibm-zdnn-plugin for TensorFlow. Not only have we optimized it to run on the IBM Z and LinuxONE platforms, but also to leverage IBM z16’s on-chip Integrated Accelerator for AI. As a result, customers can bring in TensorFlow models trained anywhere and seamlessly deploy them on the IBM Z platform closer to where their business-critical applications run.

This enables real-time inferencing across a massive number of transactions with negligible latency. As one example (of many), this can give customers the ability to screen all their credit card transactions for fraud (in real time) and react quickly enough to prevent the fraud from happening in the first place.

On IBM zSystems and LinuxONE, TensorFlow has the same ‘look and feel’ as any other platform. Users can continue to build and train their TensorFlow models on the platform of their choice (x86, Cloud or IBM zSystems). TensorFlow models trained on other platforms are portable to IBM Z and LinuxONE with ease.

We’re leveraging TensorFlow community’s PluggableDevice architecture and developed an IBM Z focused pluggable device that leverages IBM Integrated Accelerator for AI on IBM z16.

How to get started

You can begin leveraging the power of IBM-zDNN-Plugin for TensorFlow with very little effort. Getting started is a simple process:

Build and train the TensorFlow model using the platform of your choice.
Install TensorFlow 2.9 and IBM z Deep Neural Network Library:
- Container images with pre-built and pre-installed TensorFlow core 2.9 have been made available on the IBM Z and LinuxONE Container Registry.
- Others can build and install TensorFlow from source by following the steps here.
Install IBM-zDNN-Plugin from The Python Package Index (PyPI).
On IBM z16 or LinuxONE Emperor 4 system, TensorFlow will transparently target the Integrated Accelerator for AI for several compute-intensive operations during inferencing with no changes necessary to TensorFlow models.

Our recent technical blog has further details and points to a simple example that you can leverage to guide you on getting started.

Useful resources

Author

Elpida Tzortzatos

IBM Fellow and CTO of z/OS and AI on IBM Z and LinuxONE