AI

Build Machine Learning models with Minimal Effort with AutoML

Share this post:

The research and development of machine learning (ML) across various domains, such as computer vision, natural language processing, and speech transcription has been growing rapidly. ML is important in many enterprise applications across industries such as manufacturing, finance, healthcare, and marketing. With the growing adoption of ML, it is increasingly challenging to design or choose the specific model/neural network architecture for each ML task and identify a good set of hyper-parameters for training the ML model.  As a result, the process relies significantly on the advanced expertise of data scientists. These challenges have motivated growing interest in techniques that automatically discover tailored network architectures for training without human intervention, which is referred to as AutoML as shown in Figure 1.

AutoML

AutoML pipeline [1]

The AutoML system includes a meta-learning component, which leverages historical information to do faster and better search for models for training.  The ML framework is responsible for automatically choosing data processor and feature preprocessor algorithms and selecting the models.  In Deep Learning (DL), this model selection is typically replaced by Neural Architecture Search–NAS).  The ensemble module does additional post-processing, such combining multiple models together.

As summarized in Figure 2, there has been considerable recent study on AutoML using methods based on genetic algorithms, random search, Bayesian optimization, reinforcement learning and continuous differentiable methods.  The research blog [2] provides more details on Neural Architecture Search.

AutoML

AutoML methods summary

Our initial efforts for AutoML have focused on using hyperband/bayesian optimization for hyper-parameter search and hyperband/ENAS/DARTS for Neural Architecture Search.  This research has been done in the context of the IBM product Watson Machine Learning Accelerator (WML-A) and has been integrated with IBM Research services for NeuNetS and AutoAI.

It is well known that most existing AutoML approaches require considerable overhead for model searching. To improve efficiency, we present a transferable AutoML method that leverages previously trained models to speed up the search process for new tasks and datasets. Our approach involves a novel meta-feature extraction technique based on the performance of benchmark models, and a dynamic dataset clustering algorithm based on Markov process and statistical hypothesis test.  We recently published some of our work on this in CVPR 2019 [3]. The workflow is shown in Figure 3. As such, multiple models can share a common structure with different learned parameters.

AutoML

Fig. 3: Pipeline of transferable AutoML

Our method enjoys some flexibilities against existing methods in three dimensions:

  • Search algorithms. Unlike multi-task solutions designed for Bayesian optimization, or transfer learning with AutoML based on reinforcement learning, our approach can be easily combined with most existing AutoML techniques in an out-of-box fashion.
  • Search mechanisms. Our transferable AutoML can be applied to different search schemes: search from scratch; search from predefined models (e.g. reuse GoogleNet architecture and weights of bottom layers to search an architecture for higher layers) and transfer from basic cells (transfer the searched normal/reduction cell of source datasets to target datasets). This feature makes it more flexible to handle datasets under limited time budget.
  • Online setting. Our method can be used to the online setting whereby the datasets come sequentially, and one needs to search model for the new arrival datasets efficiently.

We evaluated our technology in three search mechanisms: search from scratch (shown in Table 1), search from predefined models (shown in Table 2), and transfer from basic cells (shown in Table 3) according to the difficulties of the given datasets. The experimental results on image classification show notable speedup (3x-10x times on average in overall search time for multiple datasets) with negligible loss in accuracy.

Table 1: Total search time (in days, including the overhead of running benchmark models), total classification relative errors (TRE) and benchmark overhead on 7 datasets: in model search from scratch setting.

Table 2: Total search time (in days, including the overhead of running benchmark models), total classification relative errors (TRE) and benchmark overhead on 7 datasets: in model search from scratch setting.

Table 3: Search time (in days, including overhead). Test accuracy: in transfer from basic cells setting.

For the next step, we will explore the end to end neural architecture search for object detection and 3D object detection; neural AutoML for tabular data including vanilla tabular data and spatial-temporal tabular data; and a new aspect of differential-methods based neural architecture search.
Some of our capabilities are in the WML-A that can be found in the free trial link:

Reference:

[1] K. E. e. a. M. Feurer, A. Klein. Efficient and robust automated machine learning. In NIPS, 2015.
[2] https://www.ibm.com/blogs/research/2019/09/neural-architecture-search-survey/
[3] Chao Xue, Junchi Yan, Rong Yan, Yonggang Hu at el. Transferable AutoML by Model Sharing Over Grouped Datasets. In CVPR, 2019.

Staff Member, IBM Research-China

Xi Xia

Research Staff Member

Zhihu Wang

IBM Research – China

More AI stories

Getting AI to Reason: Using Neuro-Symbolic AI for Knowledge-Based Question Answering

Building on the foundations of deep learning and symbolic AI, we have developed a software able to answer complex questions with minimal domain-specific training. Initial results are encouraging – the system achieves state-of-the-art accuracy on two datasets with no need for specialized training.

Continue reading

IBM Research at EMNLP 2020

At the annual Conference on Empirical Methods in Natural Language Processing (EMNLP), IBM Research AI is presenting 30 papers in the main conference and 12 findings that together aim to advance the field of natural language processing (NLP).

Continue reading

DualTKB: A Dual Learning Bridge between Text and Knowledge Base

Capturing and structuring common knowledge from the real world to make it available to computer systems is one of the foundational principles of IBM Research. The real-world information is often naturally organized as graphs (e.g., world wide web, social networks) where knowledge is represented not only by the data content of each node, but also […]

Continue reading