Data Science

Taming your neural networks: how controlled experimentation can help you build better machine learning models

Share this post:

Businesses today are eager to harness machine learning and deep learning for competitive advantage—yet few businesspeople realize that building a machine learning model or neural network is a marathon, not a sprint.

Although the basic principles of model design, training and evaluation tend to be reasonably well understood by businesses, it’s easy to underestimate how complex, time-consuming and labor-intensive real-world model development can be, particularly for neural networks.

Let’s take model design as a starting point. The Asimov Institute’s “Neural Network Zoo” gives an idea of the wide range of types and families of neural network that can be used to solve business problems. Making the right choice among various flavors of recurrent, convolutional, or other network types can be crucial to achieving the right outcome, and some degree of trial-and-error is necessary for even an experienced data scientist to pick the best option.

Selecting the right type of model or neural network is only the first step, though—in practice, every machine learning problem is unique, and requires a uniquely customized model. Deep learning models, for example, can be composed of any number of layers, and each layer will have multiple settings (known as hyperparameters) that need to be adjusted to produce the best output.

Models also tend to evolve over time—and the process of evolution tends towards sophistication, not simplicity. As an example, this blog visualizes how the convolutional neural networks used in state-of-the-art image recognition have grown in complexity over the past 15 years. Today’s most sophisticated neural networks contain 100+ layers with >10,000 neurons per layer and over 10,0000 connections between the neurons of each layer adding up to over 100 million individual weights that must be trained.

That’s why most of the more complex neural networks in current use come from academic research, where teams have enough time to experiment thoroughly and identify optimal designs. By contrast, since they are generally under pressure from the business to get results quickly, commercial data scientists often need to balance quality and sophistication with expediency and speed to market.

The burden of training

One of the big problems is that you don’t just need to train one model – you need to train hundreds. You try different combinations of layers and parameters, you start training several different models, you kill off the least promising candidates, and then you iterate on the best ones, until you find what seems to be the best available combination of layers and hyperparameters to deliver the results you need.

In a typical machine learning workflow, the experimentation process is time-consuming and normally conducted overnight in batches of 10-20 training runs. With 30 to 50 batches being executed to find an optimal model that meets the business requirements, the entire workflow—from initial data exploration through to model deployment—can produce 100s of model variations over three to four months.

Moreover, even when a model makes it to deployment, the job isn’t over. Models need to keep evolving to remain relevant, and must be redesigned, retrained and redeployed regularly to maintain competitive advantage.

Streamlining an iterative process

As we have seen, model training is an inherently time-consuming process—but there are several approaches that you can take to optimize it:

  1. Focus on training 10-20 models at a time, so you explore enough hyperparameter space that you’re confident of choosing the right network configuration to explore next.
  2. Access elastic compute resources in the cloud to perform batch training overnight and focus daytime hours on analysis and planning.
  3. With hundreds of training runs to manage, track the hyperparameters and network configurations explored to avoid duplicating efforts.

IBM Watson Data Platform aims to help with all three of these goals by enriching and streamlining the tooling that data scientists use to design, train, evaluate and deploy their models. Instead of using a disjointed set of different technologies to build, train and document your models, you can take advantage of a single, coherent ecosystem of well-integrated tools.

For example, IBM Watson Machine Learning enables you to create an updated version of a model, set its hyperparameters, and initiate the training process in just a few mouse-clicks. All the settings and results will be saved and stored together with the model, so it will be easy to compare performance between iterations and select the best candidate for production deployment.

Encouraging experimentation

Critically, the solutions that we are building within IBM Watson Machine Learning are designed to encourage a more experiment-centric approach to model development. For example when you’re limited by available compute resources, you tend to train only a few models at a time and explore a limited range of hyperparameter space. We’re working to eliminate these constraints by making it possible for you to train dozens of models simultaneously, monitor the progress of training, kill off the worst-performing models quickly, and focus your efforts on the best.

These new capabilities will help data science teams to “fail fast”—minimizing the amount of time and resources they need to spend on each iteration, and reducing the cost of failure to a point where experimentation becomes economically viable.

With more time to experiment, data scientists can take a more rigorous, scientific approach to exploring how a model’s hyperparameters affect its trainability and accuracy. As a result, it should be possible to develop better models in less time, and to continuously monitor, evaluate and retrain them to evolve with the business problems that they are designed to solve.

At IBM, we’re currently working on adding these capabilities to IBM® Watson® Machine Learning, which is available in the IBM Cloud. To learn more about the latest features and our roadmap for the future, please click here to visit our website.


More Data Science stories
February 15, 2019

Liberty for Java Buildpack Now Contains Two Liberty Production Runtimes

The Liberty runtime is moving from a quarterly to a four-week release cycle. As a result, the buildpack will release new functions and fixes quicker. Users of the Liberty for Java buildpack will notice a few changes to the Liberty versions packaged with the buildpack.

Continue reading

February 14, 2019

Migrating Java Microservices to MicroProfile – Epilogue

If you are building next-generation microservices with enterprise Java, you should look at Eclipse MicroProfile. It is lightweight, easy to learn, and covers the whole spectrum of core technologies required for a microservices architecture. This is the summary of the nine-part series on our team's migration of a Spring-based microservices app to MicroProfile.

Continue reading

February 14, 2019

VIDEO – Hybrid Cloud Architecture: Intro

We're excited to introduce a three-part lightboarding video series that's going to delve into the world of hybrid cloud architecture. In this Intro video, Sai Vennam lay out the three major hybrid cloud architecture issues that we're going to cover in this series

Continue reading