AI

Build Machine Learning models with Minimal Effort with AutoML

Share this post:

The research and development of machine learning (ML) across various domains, such as computer vision, natural language processing, and speech transcription has been growing rapidly. ML is important in many enterprise applications across industries such as manufacturing, finance, healthcare, and marketing. With the growing adoption of ML, it is increasingly challenging to design or choose the specific model/neural network architecture for each ML task and identify a good set of hyper-parameters for training the ML model.  As a result, the process relies significantly on the advanced expertise of data scientists. These challenges have motivated growing interest in techniques that automatically discover tailored network architectures for training without human intervention, which is referred to as AutoML as shown in Figure 1.

AutoML

AutoML pipeline [1]

The AutoML system includes a meta-learning component, which leverages historical information to do faster and better search for models for training.  The ML framework is responsible for automatically choosing data processor and feature preprocessor algorithms and selecting the models.  In Deep Learning (DL), this model selection is typically replaced by Neural Architecture Search–NAS).  The ensemble module does additional post-processing, such combining multiple models together.

As summarized in Figure 2, there has been considerable recent study on AutoML using methods based on genetic algorithms, random search, Bayesian optimization, reinforcement learning and continuous differentiable methods.  The research blog [2] provides more details on Neural Architecture Search.

AutoML

AutoML methods summary

Our initial efforts for AutoML have focused on using hyperband/bayesian optimization for hyper-parameter search and hyperband/ENAS/DARTS for Neural Architecture Search.  This research has been done in the context of the IBM product Watson Machine Learning Accelerator (WML-A) and has been integrated with IBM Research services for NeuNetS and AutoAI.

It is well known that most existing AutoML approaches require considerable overhead for model searching. To improve efficiency, we present a transferable AutoML method that leverages previously trained models to speed up the search process for new tasks and datasets. Our approach involves a novel meta-feature extraction technique based on the performance of benchmark models, and a dynamic dataset clustering algorithm based on Markov process and statistical hypothesis test.  We recently published some of our work on this in CVPR 2019 [3]. The workflow is shown in Figure 3. As such, multiple models can share a common structure with different learned parameters.

AutoML

Fig. 3: Pipeline of transferable AutoML

Our method enjoys some flexibilities against existing methods in three dimensions:

  • Search algorithms. Unlike multi-task solutions designed for Bayesian optimization, or transfer learning with AutoML based on reinforcement learning, our approach can be easily combined with most existing AutoML techniques in an out-of-box fashion.
  • Search mechanisms. Our transferable AutoML can be applied to different search schemes: search from scratch; search from predefined models (e.g. reuse GoogleNet architecture and weights of bottom layers to search an architecture for higher layers) and transfer from basic cells (transfer the searched normal/reduction cell of source datasets to target datasets). This feature makes it more flexible to handle datasets under limited time budget.
  • Online setting. Our method can be used to the online setting whereby the datasets come sequentially, and one needs to search model for the new arrival datasets efficiently.

We evaluated our technology in three search mechanisms: search from scratch (shown in Table 1), search from predefined models (shown in Table 2), and transfer from basic cells (shown in Table 3) according to the difficulties of the given datasets. The experimental results on image classification show notable speedup (3x-10x times on average in overall search time for multiple datasets) with negligible loss in accuracy.

Table 1: Total search time (in days, including the overhead of running benchmark models), total classification relative errors (TRE) and benchmark overhead on 7 datasets: in model search from scratch setting.

Table 2: Total search time (in days, including the overhead of running benchmark models), total classification relative errors (TRE) and benchmark overhead on 7 datasets: in model search from scratch setting.

Table 3: Search time (in days, including overhead). Test accuracy: in transfer from basic cells setting.

For the next step, we will explore the end to end neural architecture search for object detection and 3D object detection; neural AutoML for tabular data including vanilla tabular data and spatial-temporal tabular data; and a new aspect of differential-methods based neural architecture search.
Some of our capabilities are in the WML-A that can be found in the free trial link:

Reference:

[1] K. E. e. a. M. Feurer, A. Klein. Efficient and robust automated machine learning. In NIPS, 2015.
[2] https://www.ibm.com/blogs/research/2019/09/neural-architecture-search-survey/
[3] Chao Xue, Junchi Yan, Rong Yan, Yonggang Hu at el. Transferable AutoML by Model Sharing Over Grouped Datasets. In CVPR, 2019.

Staff Member, IBM Research-China

Xi Xia

Research Staff Member

Zhihu Wang

IBM Research – China

More AI stories

Quantum-Inspired Logical Embedding for Knowledge Representation

In our new paper, to be presented at NeurIPS 2019, we develop a new knowledge representation, which we call “quantum embedding”, that represents conceptual knowledge using a vector space representation that preserves its logical structure and allows reasoning tasks to be solved accurately and efficiently.

Continue reading

Learning and Characterizing Causal Graphs with Latent Variables

Researchers from the MIT-IBM Watson AI Lab developed a new approach to characterize the set of plausible causal graphs from observational and interventional data that has latent variables.

Continue reading

More Is Less: Learning Efficient Video Representations

IBM researchers developed a novel low memory footprint and efficient architecture for spatio-temporal analysis of video. The results show strong performance on several benchmarks – and allow training of deeper models using larger sequences of input frames, which will lead to higher accuracy on video action recognition tasks.

Continue reading