Share this post:
In part 1 of this blog series, I set the stage with how artificial intelligence (AI) has transformed in recent years and outlined an approach to build an AI strategy.
Now, let’s look at data transformation, training techniques and subsequent steps in the AI journey.
Data transformation and training techniques for your AI strategy
Here are the next steps:
- Once your data is transformed and made available, the machine learning/deep learning (ML/DL) specialist has to evaluate AI models based on the feature set in the data. The decision depends on the accuracy needed, time available for training and validation cycles, hardware performance (CPU versus CPU+GPU) and scale of your hardware.
- In the initial stages of validating suitable AI models, we recommend trying variations of the same model or using different models and running the training and validation in parallel. The ones with the lowest error rates can be shortlisted for further evaluation and beta deployment.
- The quick way to run training and validation in parallel is to work with a smaller data set that’s representative of the full data. Use multiple hardware nodes, with each node running a job or multiple GPUs, with each training job using a single GPU.
- Some deep learning frameworks like TensorFlow support distributed execution using multi-device (GPU)/multi-node configuration, which can be exploited.
- Once the right model is identified, the key is to do a phased rollout with a few internal rounds of testing and to have a beta version introduced to a select set of your audience.
- In the beta phase, you may need to have a domain expert with ML/DL expertise validate the results before the user sees them.
So, we still haven’t addressed the data transformation part—how does that happen, and are there tools to make life simpler?
With the notion of polyglot computing gaining popularity, programming platforms like Apache Spark have become an integral part of data retrieval and transformation, more popularly known as the extract, transform and load—ETL—processes. Data from various sources can be loaded into a common data store like Hadoop, and Spark can be used to run ETL jobs on Hadoop. The advantage of this method over having Spark talking to the data at the source is that it keeps Spark resource consumption isolated and doesn’t impact source data.
If you have an existing analytics platform on Hadoop, this method becomes easier, as a significant amount of data should be in Hadoop. Otherwise, most of the above steps have to be iteratively executed.
The recipe for success in an AI journey depends on the quality of data, results (accuracy) expected and hardware available. The IBM PowerAI platform offers numerous features to ensure that clients go through less pain on the AI journey:
- A ready-to-go build of key deep learning packages, as highlighted in figure 1
- Tools like AI Vision (in Tech Preview)
- Developer-friendly tools in the worksor training and testing models that provide insights into model performance, suggestions on hyper parameter tuning and resource usage so that developers can tune the models and resource usage as they move
The idea is to help you focus on addressing your business needs through IBM PowerAI. IBM can help you in the journey by leveraging toolkits that make life easier.
Figure 1: PowerAI DL Packages
Bringing AI to enterprise with ready-to-go packages and developer tools makes PowerAI a strong AI platform for enterprises to use. The underlying hardware supports NVIDIA Tesla P100 GPUs, and that can be used by any library or platform that supports GPU (like some of the Spark machine learning libraries).
Venturing into a new sphere of technology might be intimidating, but there are IT professionals with AI expertise who are ready to help you along the way, so be brave! To quote from Transformers, “Fifty years from now, when you’re looking back at your life, don’t you want to be able to say you had the guts to get in the car?”
If you’re looking for support on your AI journey, contact IBM Systems Lab Services today. Read part one of this series here.