Recently, I have spent some time learning “Deep Learning” after work and below are some of my short notes.
Deep Learning is a series of methods(supervised/unsupervised) that solve machine learning (ML) problems using deep neural networks (more than two layers). In each layer, some modeling is used to process data.
Use cases: cancer detection, speech recognition, video captioning, real-time translation, face recognition, image/video classification, self-driving car (sign and passenger detection, lane checking) etc.
The increase popularity of Deep Learning mainly because of
- massive amount of data available for training (big data era)
- advance in machine learning research and algorithms (more details below)
- increase of computer processing capabilities (e.g. GPU)
Challenges in the past:
- Back-propagation problem (vanishing gradient) make the process take forever
- Feature selection
Advance in machine learning research and algorithms
In the following paragraphs, I briefly describe some common deep neural networks.
Convolutional Neural Networks (or CNNs)
Automatic feature selection
Inspired by examining the way our visual cortex operates: extracts a few primitive features first, and combines to form certain part of objects, and finally form the objects
Multiple convolutional layers, pooling layers, and fully connected layers
Use cases: machine vision: image recognition, skin cancer classification, object detection (real-time recognition of passenger in self-driving cars)
Recurrent Neural Networks (or RNNs)
Mainly for sequential data, e.g. stock market price, sentiment analysis, language modeling (predict next word in a sentence, quick translation of certain words, speech-to-text)
It remembers the analysis by maintaining a “state” which recur back to the net
- Computationally expensive to keep track of “state”
- “Vanishing gradient” and “Exploding gradient” problems
Long Short-Term memory is created to address these issues by maintaining a strong gradient: memory cell (hold data), 3 “gates”: write, read, and forget
Restricted Boltzmann Machines (or RBMs)
Unsupervised (extract feature without the need to label them)
Shallow network that learn to reconstruct data by themselves. During several forward and backward passes in the training process, the weights of the net are adjusted to find the relationship among input features and find which features are relevant.
Building blocks of other networks, such as Deep Belief Network
Useful for feature extraction, dimensionally reduction, pattern recognition, handling missing values, etc.
Take a set of unlabeled inputs, encode them into short codes, and then use those to reconstruct the original image, while extracting the most valuable info from data.
Useful for dimension reduction, detect a face in an image
Increase of computer processing capabilities
Deep Learning requires a lot of computation (especially matrix computation). It will take a long time to run using CPUs. Graphics Processing Units (GPU) can have many cores (up to 1000 cores), so they can handle data in parallel.
Some accelerating hardware/vendor:
- NVIDIA. CUDA software on top of GPU that helps you write program.
- AMD cards
- TensorFlow Processing Units (or TPUs)
- FPGAs (or Field Programmable Gate Arrays)
Regarding Software, TensorFlow is an open source library developed by the Google which supports CPUs, GPUs, and even distributed processing in a cluster. It provides both a Python(more complete) and a C++ APIs.