Making Sense of Neural Architecture Search

Share this post:

It is no surprise that following the massive success of deep learning technology in solving complicated tasks, there is a growing demand for automated deep learning. Even though deep learning is a highly effective technology, there is a tremendous amount of human effort that goes into designing a deep learning algorithm (Figure 1).

For instance, given the task of classifying images with a deep learning model, a practitioner might need to check off a sizeable to-do list:

  1. figure out a data augmentation policy
  2. find a suitable architecture for the model
  3. find the hyperparameter settings for the training algorithm like the classical learning rate or regularization parameters
  4. oftentimes address some auxiliary practical objectives like model compression for deployment.

The field of automated deep learning is concerned with automating this process by finding suitable preprocessing techniques and architecture designs along with training routines and configurations required to obtain a well performing deep learning model. This drive for automation has led to a lot of interesting research work. Recently IBM launched their service of NeuNetS to automatically synthesize deep neural network models for various business applications. This service includes some state-of-art methods like NCEvolve and TAPAS in its algorithm portfolio.

Figure 1 Deep Learning Pipeline

Neural architecture search

Neural architecture search (NAS), which is the primary focus of our survey, is only one component of the automation pipeline that aims to find suitable architectures for training a deep learning model. This search in itself is a computationally intensive task and has received an overwhelming interest by the deep learning community.

Consequently, there has been an upsurge in the development of neural architecture search methods leaving the field with lots of competing choices with little to no consolidation of the developed methods and lack of guidelines to help a practitioner with choosing appropriate methods. We address this gap in our survey with an extremely thorough analysis of the existing landscape. We provide a formalism which unifies the vast pool of existing methods and critically examine the different approaches. This clearly highlights the benefits of different components that contribute to the design and success of neural architecture search and along the way also sheds light on some misconceptions in the current trends of architecture search.

With the formalism as the backbone, we are able to unify and study the numerous methods used for architecture search. We decompose a NAS method into its two essential components, namely a search space, and an optimizer or the search algorithm. We further categorize the search spaces used across the literature into two different kinds, the global and the cell-based search space. The global search space does not enforce repetitive architecture patterns and therefore admits large degrees of freedom regarding the arrangement of operations (e.g. convolutions or pooling). The simplest example of a global search space is the chain-structured search space as shown in Figure 2. Different operations are applied sequentially on the input images and finally the network outputs the predictions. More complex global search spaces consider non-sequential structure or enforce the use of a template which limits the number of options. In the cell-based search space a network is constructed by repeating a structure called a cell in a prespecified arrangement as determined by a template. A cell is a combination of operations and this combination is maintained across the network as shown in Figure 2. Cell-based search spaces benefit from the advantage that discovered architectures can be easily transferred across datasets.

Figure 2 Global vs. cell-based search space

Researchers in the community have explored a wide range of optimization paradigms including reinforcement learning and evolutionary algorithms for devising novel NAS methods. While the former set of methods consist in learning a policy to create networks so as to yield high performing models (Figure 3), the latter set of methods explores a pool of candidates and modifies them with an objective to improve performance (Figure 4).

Most of the approaches in NAS require training of sampled architectures during the search process. Training neural networks is a time-consuming process that makes the overall search highly inefficient. This has opened avenues for interesting research in the field and a broad range of strategies have been suggested to tackle this inefficiency. We provide an in-depth coverage of some of the prominent methods like one-shot learning, surrogate model-based approaches and learning curve prediction. One-shot approaches are particularly interesting and comprise of methods that train a single model that encompasses all the samples encountered during the search process which allows for parameter sharing (Figure 5). Similarly, proxy approaches like surrogate model-based and learning-curve prediction methods attempt to predict performance or do a look-ahead for the performance respectively, thereby saving a significant amount of actual training time.

Although high performance is unequivocally a desired goal from a model, given the applicability of deep learning models in diverse scenarios ranging from mobile hardware to GPU clusters, other challenges like limitations in terms of memory and prediction time need to be accounted for some practical applications. For these cases, novel NAS algorithms have been proposed so as to simultaneously handle multiple objectives. Moreover, design principles used for developing architecture search methods have also benefited automation of other aspects of deep learning, specifically data augmentation policies and activation functions.

We provide an in-depth coverage of all aforementioned modelling paradigms in the survey. NAS is an exciting step towards automation, but — in the grand scheme — we have barely scratched the surface. Although there have been independent, successful attempts at addressing the automation aspect of the different components of the deep learning pipeline, their joint optimization or an end-to-end automation still remains an open challenge.

Martin Wistuba, Ambrish Rawat, Tejaswini Pedapati: A Survey on Neural Architecture Search. arXiv preprint arXiv:1905.01392, 2019

Research Staff Member, IBM Research - Ireland

Tejaswini Pedapati

Software Research Developer, IBM Research

Martin Wistuba

Research Staff Member, IBM Research-Ireland

More AI stories

IBM Research Pioneers Technologies Behind New AI for IT Capabilities

IBM is launching today a broad range of new AI-powered capabilities and services to help CIOs automate various aspects of IT development, infrastructure and operations, including IBM Watson AIOps and Accelerator for Application Modernization with AI. As is the case with much of IBM’s AI development, significant portions of the technologies underlying Watson AIOps and the Accelerator were born out of IBM Research. 

Continue reading

IBM Research AI at ICASSP 2020

The 45th International Conference on Acoustics, Speech, and Signal Processing is taking place virtually from May 4-8. IBM Research AI is pleased to support the conference as a bronze patron and to share our latest research results, described in nine papers that will be presented at the conference.

Continue reading

IBM Research Progresses Field of Human-Computer Interaction (HCI)

IBM Research's contributions to CHI 2020 focus on creating and designing AI technologies that center on user needs and societal values, spanning the topics of novel human-AI partnerships, AI UX and design, trusted AI, and AI for accessibility. 

Continue reading