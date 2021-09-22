A data feature is a part of the input data to a machine learning model, and feature engineering refers to the transformative process where a data scientist derives new information from existing data. Feature engineering is one of the key value-adding processes in an ML workflow, and good features are the difference between a model with acceptable performance and a brilliantly performant model. These mathematical transformations of raw data are read into the model, and serve as the heart of the machine learning process. Automated feature engineering (AFE) (link resides outside ibm.com) is the process of exploring the space of the viable combinations of features in a mechanistic—rather than manual—fashion.

Manual feature engineering is a modern-day alchemy that comes at a great cost in terms of time: building a single feature can often take hours, and the number of features required for a bare minimum accuracy score, let alone a production-level accuracy baseline, can number into the hundreds. By automating the exploration of a feature space, AutoML reduces the time a data science team spends in this phase from days to minutes.

Reducing the hours of manual intervention by a data scientist is not the only benefit for automated feature engineering. Generated features are often clearly interpretable.In strictly regulated industries such as healthcare or finance, that explainability is important because it lowers barriers to adopting AI via interpretability. Additionally, a data scientist or analyst benefits from the clarity of these features because they make the high-quality models more compelling and actionable. Automatedly generated features also have the potential to find new KPIs for an organization to monitor and act upon. Once a data scientist has completed feature engineering, they then have to optimize their models with strategic feature selection.