#### AI

# Removing the hunch in data science with AI-based automated feature engineering

August 23, 2017 | Written by: IBM Research Editorial Staff

Categorized: AI | Publications | Thomas J Watson Research Center

Share this post:

For data scientists, predictive modeling is the practice of predicting future outcomes using statistical models. Its increasing adoption in the field of AI includes diagnosing cancer, predicting hurricanes and optimizing supply chains, amongst other areas. However, the value of predictive modeling comes at the cost of practicing it. The cornerstone of successful predictive modeling, known as feature engineering, happens to be the most time consuming step in the data science pipeline, involving a lot of trial and error and relying on hunches by data scientists. Our team has developed a new way of doing it – more efficiently and effectively. We will be presenting our work this week at the International Joint Conference for Artificial Intelligence (IJCAI).

“At the end of the day, some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used.” – Pedro Domingos, A few useful things to know about machine learning.

# What is feature engineering?

Features are the observations or characteristics on which a model is built. For instance, one model for assessing the risk of heart disease in patients is based on their height, weight, and age, amongst other features. However, the straightforward use of available features, on their own, may not always prove very effective. One of the more effective ways is to compute a ratio of weight to squared height, also known as Body Mass Index (BMI). The process of deriving a new abstract feature based on the given data is broadly referred to as feature engineering. It is typically done using one of the many mathematical or statistical functions called transformations.

What makes the task of effective feature engineering hard is that there are literally a countless number of options of transformations a data scientist could perform. Moreover, working through those options with the trial and error of applying transformations and assessing their impact is very time consuming, often infeasible to perform thoroughly. On the other hand, feature engineering is central to producing good models, which presents a dilemma to a data scientist on how much time to devote to it. Learning Feature Engineering (LFE) presents a novel solution to this problem.

# What is LFE?

LFE is a new, radically different way of performing feature engineering, compared to the trial and error methodology. Instead, LFE automatically learns which transformations are effective on a feature in the context of a given problem. It does this by examining thousands of predictive learning problems and making associations between the efficacy of a transformation in specific situations. And, when a new problem is presented, it instantly comes up with the most effective feature choices, based on its experience.

Typically, learning any phenomena over multiple different datasets is a big no-go. That is simply because different datasets represent unrelated problems and consist of a varying number of data points, making it infeasible for machine learning algorithms to generalize insights from one dataset to another. However, LFE breaks this barrier thanks to a novel representation of feature vectors, called the Quantile Sketch Array (QSA), which is a canonical form of capturing the essence of a feature’s distribution in a fixed-size array. LFE uses a multilayer perceptron over QSAs to learn which transformations are the most relevant for a target problem. The results show that, besides being more accurate than any other approach, LFE is also the fastest, delivering results in seconds, compared to the typically expected hours or days.

# What is the significance of LFE?

Besides reducing the problem of feature engineering from a costly manual process to automated instant predictions, LFE’s significance transcends its original application. The QSA of LFE demonstrated the successful generalization of a complex task across datasets of widely varying size, shape and content. Hence, it has opened an interesting topic of research in the general area of learning to learn, a.k.a. meta-learning. Finally, there is considerable recent interest in the automation of data science and predictive analytics, an industry expected to be worth $200 Billion, by 2020.

As such, we will be integrating LFE into IBM’s Data Science Experience in the coming months making it available for thousands of users of IBM’s premier data science tool kit.

*LFE started as the internship project of Fatemeh Nargesian from U. of Toronto with the Automated Machine Learning and Data Science team at IBM Research. The publication can be accessed at:*

*Learning Feature Engineering for Classification by F. Nargesian, H. Samulowitz, U Khurana, E Khalil, D Turaga in the proceedings of the Twenty Sixth International Conference on Artificial Intelligence (IJCAI), 2017.*

### Adversarial Robustness 360 Toolbox v1.0: A Milestone in AI Security

IBM researchers published the first major release of the Adversarial Robustness 360 Toolbox (ART). Initially released in April 2018, ART is an open-source library for adversarial machine learning that provides researchers and developers with state-of-the-art tools to defend and verify AI models against adversarial attacks. ART addresses growing concerns about people’s trust in AI, specifically the security of AI in mission-critical applications.

### Making Sense of Neural Architecture Search

It is no surprise that following the massive success of deep learning technology in solving complicated tasks, there is a growing demand for automated deep learning. Even though deep learning is a highly effective technology, there is a tremendous amount of human effort that goes into designing a deep learning algorithm.

### Pushing the boundaries of convex optimization

Convex optimization problems, which involve the minimization of a convex function over a convex set, can be approximated in theory to any fixed precision in polynomial time. However, practical algorithms are known only for special cases. An important question is whether it is possible to develop algorithms for a broader subset of convex optimization problems that are efficient in both theory and practice.