Making Sense of Neural Architecture Search

Share this post:

It is no surprise that following the massive success of deep learning technology in solving complicated tasks, there is a growing demand for automated deep learning. Even though deep learning is a highly effective technology, there is a tremendous amount of human effort that goes into designing a deep learning algorithm (Figure 1).

For instance, given the task of classifying images with a deep learning model, a practitioner might need to check off a sizeable to-do list:

  1. figure out a data augmentation policy
  2. find a suitable architecture for the model
  3. find the hyperparameter settings for the training algorithm like the classical learning rate or regularization parameters
  4. oftentimes address some auxiliary practical objectives like model compression for deployment.

The field of automated deep learning is concerned with automating this process by finding suitable preprocessing techniques and architecture designs along with training routines and configurations required to obtain a well performing deep learning model. This drive for automation has led to a lot of interesting research work. Recently IBM launched their service of NeuNetS to automatically synthesize deep neural network models for various business applications. This service includes some state-of-art methods like NCEvolve and TAPAS in its algorithm portfolio.

Figure 1 Deep Learning Pipeline

Neural architecture search

Neural architecture search (NAS), which is the primary focus of our survey, is only one component of the automation pipeline that aims to find suitable architectures for training a deep learning model. This search in itself is a computationally intensive task and has received an overwhelming interest by the deep learning community.

Consequently, there has been an upsurge in the development of neural architecture search methods leaving the field with lots of competing choices with little to no consolidation of the developed methods and lack of guidelines to help a practitioner with choosing appropriate methods. We address this gap in our survey with an extremely thorough analysis of the existing landscape. We provide a formalism which unifies the vast pool of existing methods and critically examine the different approaches. This clearly highlights the benefits of different components that contribute to the design and success of neural architecture search and along the way also sheds light on some misconceptions in the current trends of architecture search.

With the formalism as the backbone, we are able to unify and study the numerous methods used for architecture search. We decompose a NAS method into its two essential components, namely a search space, and an optimizer or the search algorithm. We further categorize the search spaces used across the literature into two different kinds, the global and the cell-based search space. The global search space does not enforce repetitive architecture patterns and therefore admits large degrees of freedom regarding the arrangement of operations (e.g. convolutions or pooling). The simplest example of a global search space is the chain-structured search space as shown in Figure 2. Different operations are applied sequentially on the input images and finally the network outputs the predictions. More complex global search spaces consider non-sequential structure or enforce the use of a template which limits the number of options. In the cell-based search space a network is constructed by repeating a structure called a cell in a prespecified arrangement as determined by a template. A cell is a combination of operations and this combination is maintained across the network as shown in Figure 2. Cell-based search spaces benefit from the advantage that discovered architectures can be easily transferred across datasets.

Figure 2 Global vs. cell-based search space

Researchers in the community have explored a wide range of optimization paradigms including reinforcement learning and evolutionary algorithms for devising novel NAS methods. While the former set of methods consist in learning a policy to create networks so as to yield high performing models (Figure 3), the latter set of methods explores a pool of candidates and modifies them with an objective to improve performance (Figure 4).

Most of the approaches in NAS require training of sampled architectures during the search process. Training neural networks is a time-consuming process that makes the overall search highly inefficient. This has opened avenues for interesting research in the field and a broad range of strategies have been suggested to tackle this inefficiency. We provide an in-depth coverage of some of the prominent methods like one-shot learning, surrogate model-based approaches and learning curve prediction. One-shot approaches are particularly interesting and comprise of methods that train a single model that encompasses all the samples encountered during the search process which allows for parameter sharing (Figure 5). Similarly, proxy approaches like surrogate model-based and learning-curve prediction methods attempt to predict performance or do a look-ahead for the performance respectively, thereby saving a significant amount of actual training time.

Although high performance is unequivocally a desired goal from a model, given the applicability of deep learning models in diverse scenarios ranging from mobile hardware to GPU clusters, other challenges like limitations in terms of memory and prediction time need to be accounted for some practical applications. For these cases, novel NAS algorithms have been proposed so as to simultaneously handle multiple objectives. Moreover, design principles used for developing architecture search methods have also benefited automation of other aspects of deep learning, specifically data augmentation policies and activation functions.

We provide an in-depth coverage of all aforementioned modelling paradigms in the survey. NAS is an exciting step towards automation, but — in the grand scheme — we have barely scratched the surface. Although there have been independent, successful attempts at addressing the automation aspect of the different components of the deep learning pipeline, their joint optimization or an end-to-end automation still remains an open challenge.

Martin Wistuba, Ambrish Rawat, Tejaswini Pedapati: A Survey on Neural Architecture Search. arXiv preprint arXiv:1905.01392, 2019

Research Staff Member, IBM Research - Ireland

Tejaswini Pedapati

Software Research Developer, IBM Research

Martin Wistuba

Research Staff Member, IBM Research-Ireland

More AI stories

Pushing the boundaries of human-AI interaction at IUI 2021

At the 2021 virtual edition of the ACM International Conference on Intelligent User Interfaces (IUI), researchers at IBM will present five full papers, two workshop papers, and two demos.

Continue reading

From HPC Consortium’s success to National Strategic Computing Reserve

Founded in March 2020 just as the pandemic’s wave was starting to wash over the world, the Consortium has brought together 43 members with supercomputing resources. Private and public enterprises, academia, government and technology companies, many of whom are typically rivals. “It is simply unprecedented,” said Dario Gil, Senior Vice President and Director of IBM Research, one of the founding organizations. “The outcomes we’ve achieved, the lessons we’ve learned, and the next steps we have to pursue are all the result of the collective efforts of these Consortium’s community.” The next step? Creating the National Strategic Computing Reserve to help the world be better prepared for future global emergencies.

Continue reading