Build Machine Learning models with Minimal Effort with AutoML

Share this post:

The research and development of machine learning (ML) across various domains, such as computer vision, natural language processing, and speech transcription has been growing rapidly. ML is important in many enterprise applications across industries such as manufacturing, finance, healthcare, and marketing. With the growing adoption of ML, it is increasingly challenging to design or choose the specific model/neural network architecture for each ML task and identify a good set of hyper-parameters for training the ML model.  As a result, the process relies significantly on the advanced expertise of data scientists. These challenges have motivated growing interest in techniques that automatically discover tailored network architectures for training without human intervention, which is referred to as AutoML as shown in Figure 1.


AutoML pipeline [1]

The AutoML system includes a meta-learning component, which leverages historical information to do faster and better search for models for training.  The ML framework is responsible for automatically choosing data processor and feature preprocessor algorithms and selecting the models.  In Deep Learning (DL), this model selection is typically replaced by Neural Architecture Search–NAS).  The ensemble module does additional post-processing, such combining multiple models together.

As summarized in Figure 2, there has been considerable recent study on AutoML using methods based on genetic algorithms, random search, Bayesian optimization, reinforcement learning and continuous differentiable methods.  The research blog [2] provides more details on Neural Architecture Search.


AutoML methods summary

Our initial efforts for AutoML have focused on using hyperband/bayesian optimization for hyper-parameter search and hyperband/ENAS/DARTS for Neural Architecture Search.  This research has been done in the context of the IBM product Watson Machine Learning Accelerator (WML-A) and has been integrated with IBM Research services for NeuNetS and AutoAI.

It is well known that most existing AutoML approaches require considerable overhead for model searching. To improve efficiency, we present a transferable AutoML method that leverages previously trained models to speed up the search process for new tasks and datasets. Our approach involves a novel meta-feature extraction technique based on the performance of benchmark models, and a dynamic dataset clustering algorithm based on Markov process and statistical hypothesis test.  We recently published some of our work on this in CVPR 2019 [3]. The workflow is shown in Figure 3. As such, multiple models can share a common structure with different learned parameters.


Fig. 3: Pipeline of transferable AutoML

Our method enjoys some flexibilities against existing methods in three dimensions:

  • Search algorithms. Unlike multi-task solutions designed for Bayesian optimization, or transfer learning with AutoML based on reinforcement learning, our approach can be easily combined with most existing AutoML techniques in an out-of-box fashion.
  • Search mechanisms. Our transferable AutoML can be applied to different search schemes: search from scratch; search from predefined models (e.g. reuse GoogleNet architecture and weights of bottom layers to search an architecture for higher layers) and transfer from basic cells (transfer the searched normal/reduction cell of source datasets to target datasets). This feature makes it more flexible to handle datasets under limited time budget.
  • Online setting. Our method can be used to the online setting whereby the datasets come sequentially, and one needs to search model for the new arrival datasets efficiently.

We evaluated our technology in three search mechanisms: search from scratch (shown in Table 1), search from predefined models (shown in Table 2), and transfer from basic cells (shown in Table 3) according to the difficulties of the given datasets. The experimental results on image classification show notable speedup (3x-10x times on average in overall search time for multiple datasets) with negligible loss in accuracy.

Table 1: Total search time (in days, including the overhead of running benchmark models), total classification relative errors (TRE) and benchmark overhead on 7 datasets: in model search from scratch setting.

Table 2: Total search time (in days, including the overhead of running benchmark models), total classification relative errors (TRE) and benchmark overhead on 7 datasets: in model search from scratch setting.

Table 3: Search time (in days, including overhead). Test accuracy: in transfer from basic cells setting.

For the next step, we will explore the end to end neural architecture search for object detection and 3D object detection; neural AutoML for tabular data including vanilla tabular data and spatial-temporal tabular data; and a new aspect of differential-methods based neural architecture search.
Some of our capabilities are in the WML-A that can be found in the free trial link:


[1] K. E. e. a. M. Feurer, A. Klein. Efficient and robust automated machine learning. In NIPS, 2015.
[3] Chao Xue, Junchi Yan, Rong Yan, Yonggang Hu at el. Transferable AutoML by Model Sharing Over Grouped Datasets. In CVPR, 2019.

Staff Member, IBM Research-China

Xi Xia

Research Staff Member

Zhihu Wang

IBM Research – China

More AI stories

When Future Scientists Encounter AI

On a rainy Saturday, a group of “future innovators” arrived at IBM Research-China. The 40 young students, selected from a pool of nearly 200 applicants, spent the day interacting with and exploring some of IBM Research’s advanced AI technologies. We’ve been advancing cutting-edge technologies at our China lab for more than two decades. This event, […]

Continue reading

IBM Research and Tencent Delight Basketball Fans with AI-Based Highlight Reel

At 11:45 p.m. on June 8 in the Quicken Loans Arena in Cleveland, Ohio, the Golden State Warriors finished a sweep of the Cleveland Cavaliers with a 108-85 blowout win. As one of the most dominant players in the NBA, the Warriors’ Kevin Durant was named the 2018 NBA Finals MVP, his second time earning […]

Continue reading

Using Deep Learning to Predict Emergency Room Visits

At IBM Research, we are using deep learning to explore new solutions for a range of health care challenges. One such challenge is emergency room (ER) overcrowding, which can lead to long wait times for treatment. Overcrowding results in part from people visiting the ER for non-emergency conditions rather than relying on primary physicians. Patients […]

Continue reading