Machine Learning and Deep Learning are very promising technologies. Every week comes with its new hyped successes. Yet, when it comes to applying machine learning and deep learning many people keep making the same mistakes. Here is one that is particularly troublesome: people often miss that you need to provide examples to learn from. They expect systems to learn from raw data without any supervision or feedback.
I can't blame them, as many proponents of machine learning, deep learning, or artificial intelligence just speak about how systems learned from data, without entering the gory details of what you need to do to get there.
In order to wake them up, I use a very simple story:
If you give a bike to children they won't be able to ride it. But if you show them, then they'll learn in a couple of days or less. Machines are no different from humans: they learn from examples, or from feedback, or both.
Let me give concrete examples of the issue.
- I was asked to help sell machine learning to a bank. They asked us to have a look at predicting failures of their payment system. They had previous attempts with various technologies and various consulting companies, but they all failed. Training data would be transactions processed in the past.
My dialog with the client team went like this:
- (me) Do we know which transactions failed among all the training data transactions?
- Do we have enough data?
- Yes, we have millions in our training data set.
- Perfect, can I get access to data?
It took a while to get access because of security concerns, as is often the case in regulated industries like banking, but I finally got access to a file of million transactions. I looked at it, there were hundreds of features, but I could not find where the target values were (whether a given transaction is considered as a failure or not). When I asked about it, I was pointed to a second file where the target values where. True, the values were there, and all I needed to do was to join it with the other data set.
There was an issue though: the target file contained only 1,700 transactions or so, and only 8 failures out of these 1,700 transactions. No wonder no machine learning approach would work: how can you learn from 8 examples only?
I suggested they spent time labeling other transactions. Assuming 10 transaction labels per minute, a one man week effort would yield more than 10 times more examples to learn from.
Yet, the client team refused to do that, and guess what, they still have no success with machine learning.
- In another client context, I was asked to help sell Watson (our deep learning enabled set of APIs). The client team was more mature than the one above, and they fully got that they need labelled examples. Their question to us was: can Watson label examples for us?
When you think of it, they were asking the learning system to label examples it would be trained on. This is machine learning upside down: normally you train your system first, then, if you did a good job at training, the system can label data correctly.
Our answer was that unfortunately they had to provide the labels themselves, of course.
- In a more recent example where we were missing labels, the client team was asking me if we could use unsupervised learning like clustering. Why? because, as the name indicates, unsupervised learning techniques do not require labels. But this comes at a price: these methods are not predictive at all. You cannot use them in isolation to learn how to predict a target.
I'll stop here, but this is a recurring pattern with people starting their machine learning journey. The best advice we can give them is very well expressed in this tweet :
I would not have said it better.
To people's defense, labeling data can be costly and error prone. Machine Learning adoption will get easier if it can learn from less data. There are two active research areas trying to address this shortcoming of machine learning:
- One is active learning, where only a subset of the data is labelled, and the system auto labels the rest.
- A second one is transfer learning: you apply an already trained model to a new domain where you further train it. Good news is that most of the training work was done before, and you need a small number of examples to adapt the model to the new use case.
An alternative way is to look at the example generation process itself. In some cases it is possible to get labels as a side effect of machine learning. For instance, if you use machine learning to make product recommendations for an e-commerce site, then each time a visitor buys you get labelled data for free: you know what the visitor was really interested in, it is the products in his basket. You can then compare to the predictions your ML algorithm made and have it learn from it. This learning from feedback is at the heard of many successful applications of machine learning, see Machine Learning Algorithm != Learning Machine for more details.