Announcing PowerAI Enterprise: Bringing data science into production

By | 6 minute read | June 19, 2018

Today I am tremendously excited to announce IBM Cognitive Systems’ newest applied artificial intelligence (AI) offering, PowerAI Enterprise. This new platform extends all the capability we have been packing into our distribution of deep learning and machine learning frameworks, PowerAI, by adding tools which span the entire model development workflow. With these capabilities, customers are positioned to develop better models more quickly, and as their requirements grow efficiently scale and share data science infrastructure.

Start with the data

data preparation for deep learning, PowerAI EnterpriseAll machine learning and deep learning models train on large amounts of data. Fortunately (and unfortunately), organizations are swimming in data sitting in structured and unstructured forms, and beyond the data they have under their control, organizations also have access to data for free or for a fee from a variety of sources.

Often, little of this data is in proper placement or forms for training a new AI model. To date, we have found that this has been a problem largely solved by manual methods: miles and miles of python scripting, often run inside spark clusters for speed of execution, along with a lot of orphan code.

To help shorten transformation time, PowerAI Enterprise integrates a structured, template-based approach to building and transforming data sets. It starts with common output formats (LMDB, TensorFlowRecords, Images for Vector Output), and allows users to define the input format/structure of raw data and some of the key characteristics of what is needed in the transform step.

The data import tools in PowerAI Enterprise are aware of the size and the complexity of the data and the resources available to transform the data. For this reason, the integrated resource manager is able to intelligently manage the execution of the job: helping to optimize for either low cost (run across the fewest number of nodes/cores) or optimize for the fastest execution of the transform job (run across more nodes/cores).

Integrated into the data preparation step is a quality check function which is designed to allow a data engineer and a data scientist to check the clarity of the signal in the data, running a simplified model and sample training from within the data import tool. Although not as sophisticated as a fully-developed model, this “gut check” allows a data scientist to discover early on in the process whether there are obvious issues or deficiencies in the training data set before investing significant time in the model development phase.

Speed up the model development process

PowerAI Enterprise includes powerful model setup tools designed to address the earliest “dead end” training runs. Integrated hyperparameter optimization automates the process of characterizing new models by sampling from the training data set and instantiating multiple small training jobs across cluster resources (this means sometimes tens, sometimes thousands of jobs depending on the complexity of the model). The tool is designed to select the most promising combinations of hyperparameters to return to the data science teams. The outcome: fewer non-productive early runs and more time to focus on refining models for greater organizational value.

Once you have selected hyperparameters, you can begin bringing together all of the different elements a deep learning model training.

This next phase in the development process is extremely iterative. Even with assistance in selecting the correct hyperparameters, ensuring that your data is clean and has a clear signal within it, and that you were able to operate at the appropriate level of scale, chances are you will still be repeating training runs. By instrumenting the training process, PowerAI Enterprise can allow a data scientist to see feedback in real time on the training cycle.

runtime training visualization, PowerAI EnterprisePowerAI Enterprise provides the ability to visualize current progress and status of your training job, including iteration, loss, accuracy and histograms of weights, activations, and gradients of the neural network.

With this feedback, data scientists and model developers are alerted when the training process begins to go awry. These early warnings can allow data scientists and model developers to stop training runs that will eventually go nowhere and adjust parameters.

These workflow tools run on top of IBM’s scalable, distributed deep learning platforms. They take the best of open source frameworks and augment them for both large model support and better cluster performance, both of which open up the potential to take applied artificial intelligence into areas and use cases which were not previously feasible.

Bringing all these capabilities together accelerates development for data scientists, and the combination of automating workflow and extending the capabilities of open source frameworks unlocks the hidden value in organizational data.

As Gurinder Grewal, Senior Director, Architecture at PayPal said at IBM’s Think conference: “How do you take all these technologies and marry them together to build end to end platform that we can hand over to a data scientist and the business policy owners so they can extract most value out of the data?  I think that’s what excites us most about things your company is working on in terms of the PowerAI platform…  I think that’s one of the things we actually really appreciate the openness of the platform, extracting the most value out of the compute power we have and the power from the data.”

A foundation for data science as a service

At the core of the platform is an enterprise-class management software system for running compute- and data-intensive distributed applications on a scalable, shared infrastructure.

IBM PowerAI Enterprise supports multiple users and lines of business with multi-tenancy end-to-end security, including role-based access controls. Organizational leaders are looking to deploy AI infrastructure at scale. The combination of integrated security (including role-based access, encryption of workload and data), the ability to support service level agreements, and an extremely scalable resource orchestration designed for very large compute infrastructure, mean that it is now possible to share data science environments across the organization.

One customer which has successfully navigated the new world of AI is Wells Fargo. They use deep learning models to comply with a critical financial validation process.  Their data scientists build, enhance, and validate hundreds of models each day. Speed is critical, as well as scalability, as they deal with greater amounts of data and more complicated models. As Richard Liu, Quantitative Analytics manager at Wells Fargo said at IBM Think, “Academically, people talk about fancy algorithms. But in real life, how efficiently the models run in distributed environments is critical.” Wells Fargo uses the IBM AI Enterprise software platform for the speed and resource scheduling and management functionality it provides. “IBM is a very good partner and we are very pleased with their solution,” adds Liu.

Each part of the platform is designed to remove both time and pain from the process of developing a new applied artificial intelligence service.  By automating highly repetitive and manual steps, time is saved for improving and refining models, which can lead to a higher-quality result.

We’ve introduced a lot of new functionality in IBM PowerAI Enterprise 1.1, and I’ll be sharing more detail on these new capabilities in future posts.  I also welcome your input as we continue to add new capabilities moving forward.

Thank you, and I’d love to hear from you about how PowerAI Enterprise can help bring AI into production.