Posted in: IBM Research-Ireland

Using AI to Design Deep Learning Architectures

Selecting the best architecture for deep learning architectures is typically a time-consuming process that requires expert input, but using AI can streamline this process. I am developing an evolutionary algorithm for architecture selection that is up to 50,000 times faster than other methods, with only a small increase in error rate.

Deep learning models are applied in many IBM Watson products and services and can perform challenging tasks such as visual recognition, text to speech and vice versa, playing board games and much more. These models emulate the workings of the human brain, and, like the brain, their architecture is crucial to their function.

At IBM, engineers and scientists select the best architecture for a deep learning model from a large set of possible candidates. Today this is a time-consuming manual process; however, using a more powerful automated AI solution to select the neural network can save time and enable non-experts to apply deep learning faster. My evolutionary algorithm is designed to reduce the search time for the right deep learning architecture to just hours, making the optimization of deep learning network architecture affordable for everyone.

An evolutionary algorithm for deep learning networks

My proposed method treats a convolutional neural network architecture as a sequence of neuro-cells, then applies a series of mutations in order to fi nd a structure that improves the performance of the neural network for a given dataset and machine learning task. This approach substantially shortens network training time. The mutations alter the structure of the network, but do not change the network’s predictions, and can include adding layers, adding new connections, or widening kernels or layers.

Function-preserving mutation in optimizing deep learning networks

Figure 1. Example of a function-preserving mutation. The architecture on the right has a mutation but gives the same prediction as the architecture on the left (represented by the same colors).

Experimental evaluation

I compared my new neuro-evolutional approach with several other methods on the task of image classification on the CIFAR-10 and CIFAR-100 datasets. These datasets are collections of images commonly used to train machine learning and computer vision algorithms. My algorithm had slightly higher classification error but required significantly less time, compared with state-of-the-art human-designed architectures, results of architecture search methods based on reinforcement learning, and results for other automated methods based on evolutionary algorithms. It was up to 50,000 times faster than some other methods, with an error rate at most 0.6% higher than the best competitor on the benchmark dataset CIFAR-10.

The figures below visualize the algorithm’s optimization process. In Figure 2, each dot represents a different structure, and the connecting lines represent mutations. The color scale shows the accuracy of each structure, and the x axis represents time. Accuracy increases quickly over the first 10 hours, then progress is slow but steady afterwards.

Optimization of the evolutionary algorithm for designing deep learning networks

Figure 2. Optimization of the evolutionary algorithm over time.

Figure 3 shows the mutations to the network structure at each hour of the process.

Evolution of a deep learning netowrk structure over time

Figure 3. Evolution of a network structure over time. Some intermediate states are not shown. (click to enlarge)

In future, I hope to integrate this optimization method into IBM’s cloud services and make it available to  clients. Furthermore, I plan to extend it to larger datasets like ImageNet and additional kinds of data such as time-series and text.

I will present this approach at the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD) conference in Dublin, Ireland, in September. This event is the premier machine learning and data mining conference in Europe and it will be co-hosted by IBM Research-Ireland, University College Dublin School of Computer Science and the Insight Centre for Data Analytics. As part of the extensive conference programme, our adversarial AI experts will be hosting the 1st Workshop on Recent Advances in Adversarial Machine Learning (Nemesis’18), which aims to bring together researchers and practitioners to discuss recent advances in the rapidly evolving field of adversarial machine learning.


Deep Learning Architecture Search by Neuro-Cell-based Evolution with Function-Preserving Mutations
Martin Wistuba
European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2018, Dublin, Ireland, September 10-14, 2018

Martin Wistuba

IBM Research Staff Member