Automated Machine Learning (AutoML) has become a trending topic in industry and academic artificial intelligence (AI) research in recent years. AutoML shows great promise in providing solutions for AI in regulated industries in providing explainable and reproducible results. AutoML allows for greater access to AI development for those without the theoretical background currently needed for role in data science.
Every step in the current prototypical data science pipeline, such as data preprocessing, feature engineering, and hyperparameter optimization, has to be done manually by machine learning experts. By comparison, adopting AutoML allows a simpler development process by which a few lines of code can generate the code necessary to begin developing a machine learning model.
One can think of AutoML - regardless of whether building classifiers or training regressions - as a generalized search concept, with specialized search algorithms for finding the optimal solutions for each component piece of the ML pipeline. In building a system that allows for the automation of just three key pieces of automation – feature engineering, hyperparameter optimization, and neural architecture search – AutoML promises a future where democratized machine learning is a reality.
In a data science pipeline, there are many steps a data science team must go through to build a predictive model. Even experienced teams of data scientists and ML engineers can benefit from the increased speed and transparency that comes with AutoML. A data scientist has to start with a hypothesis, gather the correct dataset, try some data visualization, engineer extra features to harness all signal available, train a model with hyperparameters (link resides outside IBM), and for state-of-the-art deep learning they have to design the optimal architecture for a Deep Neural Network - hopefully on a GPU if available to them.
A data feature is a part of the input data to a machine learning model, and feature engineering refers to the transformative process where a data scientist derives new information from existing data. Feature engineering is one of the key value-adding processes in an ML workflow, and good features are the difference between a model with acceptable performance and a brilliantly performant model. These mathematical transformations of raw data are read into the model, and serve as the heart of the machine learning process. Automated feature engineering (PDF 1.7 MB) (AFE) (link resides outside IBM) is the process of exploring the space of the viable combinations of features in an mechanistic – rather than manual – fashion.
Manual feature engineering is a modern-day alchemy that comes at a great cost in terms of time: building a single feature can often take hours, and the number of features required for a bare minimum accuracy score, let alone a production-level accuracy baseline, can number into the hundreds. By automating the exploration of a feature space, AutoML reduces the time a data science team spends in this phase from days to minutes.
Reducing the hours of manual intervention by a data scientist is not the only benefit for automated feature engineering. Generated features are often clearly interpretable. In strictly regulated industries such as healthcare or finance, that explainability is important because it lowers barriers to adopting AI via interpretability. Additionally, a data scientist or analyst benefits from the clarity of these features because they make the high-quality models more compelling and actionable. Automatedly generated features also have the potential to find new KPIs for an organization to monitor and act upon. Once a data scientist has completed feature engineering, they then have to optimize their models with strategic feature selection.
Hyperparameters are a part of machine learning algorithms best understood by analogy as levers for fine-tuning model performance – though often incremental adjustments have outsize impact. In small scale data science modeling, hyperparameters can easily be set by hand and optimized by trial and error.
For deep learning applications, the number of hyperparameters grows exponentially which puts their optimization beyond the abilities of a data science team to accomplish in a manual and timely fashion. Automated hyperparameter optimization (HPO) (link resides outside IBM) relieves teams of the intensive responsibility to explore and optimize across the full event space for hyperparameters and instead allows teams to iterate and experiment over features and models.
Another strength of automating the machine learning process is that now data scientists can focus on the why of model creation rather than the how. Considering the extremely large amounts of data available to many enterprises and the overwhelming number of questions that can be answered with this data, an analytics team can pay attention to which aspects of the model they should optimize for, such as the classic problem of minimizing false negatives in medical testing.
The most complex and time-consuming process in deep learning is the creation of the neural architecture. Data science teams spend long amounts of time selecting the appropriate layers and learning rates that in the end are often only for the weights in the model, as in many language models. Neural architecture search (NAS) (link resides outside IBM) has been described as “using neural nets to design neural nets” and is one of the most obvious areas of ML to benefit from automation.
NAS searches begin with a choice of which architectures to try. The outcome of NAS is determined by the metric against which each architecture is judged. There are several common algorithms to use in a neural architecture search. If the potential number of architectures is small, choices for testing can be made at random. Gradient-based approaches, whereby the discrete search space is turned into a continuous representation, have shown to be very effective. Data science teams can also try evolutionary algorithms in which architectures are evaluated at random, and changes are applied slowly, propagating child architectures that are more successful while pruning those that are not.
Neural architecture searches are one of the key elements of AutoML that promise to democratize AI. These searches, however, often come with a very high carbon footprint. An examination of these tradeoffs has not been done yet, and optimization for ecological cost is an ongoing search area in NAS approaches.
Automated Machine Learning sounds like a panacea of technical solutionism that an organization can use to replace expensive data scientists, but in reality using it requires intelligent strategies for an organization. Data scientists fill essential roles to design experiments, translate results into business outcomes, and maintain the full lifecycle of their machine learning models. So how can cross-functional teams make use of AutoML to optimize their use of time and shorten time to realizing value from their models?
The optimal workflow for including AutoML APIs is one that uses it to parallelize workloads and shorten time spent on manually intensive tasks. Instead of spending days on hyperparameter tuning, a data scientist could instead automate this process on multiple types of models concurrently, and then subsequently test which was most performant.
Additionally, there are AutoML features that enable team members of different skill levels to now contribute to the data science pipeline. A data analyst without Python expertise could leverage a toolkit, like AutoAI on Watson Studio, to train a predictive model using the data they’re able to extract on their own via query. Using AuotML, a data analyst can now preprocess data, build a machine learning pipeline, and produce a fully trained model they can use for validating their own hypotheses without requiring a full data science team’s attention.
IBM researchers and developers contribute to the growth and development of AutoML. Ongoing product development with AutoAI on IBM Watson and the work of the IBM Researchers on Lale (link resides outside IBM), an open-source automated data science library, are just some of the ways that IBM helps to create the next generation of AI approaches. While Lale is an open source project, it is actually core to many of AutoAI’s capabilities.
For data science teams who work with Python as the core of their ML stack, Lale offers a semi-automated library that integrates seamlessly within scikit-learn (link resides outside IBM) pipelines - different than auto-sklearn (link resides outside IBM), or a library like TPOT (link resides outside IBM). Lale goes beyond scikit-learn with automation, correctness checks, and interoperability. While based in the scikit-learn paradigm, it has increasing numbers of transformers and operators from other Python libraries and from libraries in languages such as Java and R.