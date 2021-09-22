In a data science pipeline, there are many steps a data science team must go through to build a predictive model. Even experienced teams of data scientists and ML engineers can benefit from the increased speed and transparency that comes with AutoML. A data scientist must start with a hypothesis, gather the correct dataset, try some data visualization, engineer extra features to harness all signal available, train a model with hyperparameters (link resides outside ibm.com), and for state-of-the-art deep learning they must design the optimal architecture for a Deep Neural Network—hopefully on a GPU if available to them.

Automated feature engineering



A data feature is a part of the input data to a machine learning model, and feature engineering refers to the transformative process where a data scientist derives new information from existing data. Feature engineering is one of the key value-adding processes in an ML workflow, and good features are the difference between a model with acceptable performance and a brilliantly performant model. These mathematical transformations of raw data are read into the model, and serve as the heart of the machine learning process. Automated feature engineering (AFE) (link resides outside ibm.com) is the process of exploring the space of the viable combinations of features in a mechanistic—rather than manual—fashion.

Manual feature engineering is a modern-day alchemy that comes at a great cost in terms of time: building a single feature can often take hours, and the number of features required for a bare minimum accuracy score, let alone a production-level accuracy baseline, can number into the hundreds. By automating the exploration of a feature space, AutoML reduces the time a data science team spends in this phase from days to minutes.

Reducing the hours of manual intervention by a data scientist is not the only benefit for automated feature engineering. Generated features are often clearly interpretable. In strictly regulated industries such as healthcare or finance, that explainability is important because it lowers barriers to adopting AI via interpretability. Additionally, a data scientist or analyst benefits from the clarity of these features because they make the high-quality models more compelling and actionable. Automatedly generated features also have the potential to find new KPIs for an organization to monitor and act upon. Once a data scientist has completed feature engineering, they then have to optimize their models with strategic feature selection.

Automated hyperparameter optimization



Hyperparameters are a part of machine learning algorithms best understood by analogy as levers for fine-tuning model performance—though often incremental adjustments have outsize impact. In small scale data science modeling, hyperparameters can easily be set by hand and optimized by trial and error.

For deep learning applications, the number of hyperparameters grows exponentially, which puts their optimization beyond the abilities of a data science team to accomplish in a manual and timely fashion. Automated hyperparameter optimization (HPO) (link resides outside ibm.com) relieves teams of the intensive responsibility to explore and optimize across the full event space for hyperparameters and instead allows teams to iterate and experiment over features and models.

Another strength of automating the machine learning process is that now data scientists can focus on the why of model creation rather than the how. Considering the extremely large amounts of data available to many enterprises and the overwhelming number of questions that can be answered with this data, an analytics team can pay attention to which aspects of the model they should optimize for, such as the classic problem of minimizing false negatives in medical testing.

Neural architecture search



The most complex and time-consuming process in deep learning is the creation of the neural architecture. Data science teams spend long amounts of time selecting the appropriate layers and learning rates that in the end are often only for the weights in the model, as in many language models. Neural architecture search (NAS) (link resides outside ibm.com) has been described as “using neural nets to design neural nets” and is one of the most obvious areas of ML to benefit from automation.

NAS searches begin with a choice of which architectures to try. The outcome of NAS is determined by the metric against which each architecture is judged. There are several common algorithms to use in a neural architecture search. If the potential number of architectures is small, choices for testing can be made at random. Gradient-based approaches, whereby the discrete search space is turned into a continuous representation, have shown to be very effective. Data science teams can also try evolutionary algorithms in which architectures are evaluated at random, and changes are applied slowly, propagating child architectures that are more successful while pruning those that are not.

Neural architecture searches are one of the key elements of AutoML that promise to democratize AI. However, these searches often come with a very high carbon footprint. An examination of these tradeoffs has not been done yet, and optimization for ecological cost is an ongoing search area in NAS approaches.