Detection of objects from images or from video is no trivial task. We need to use some type of machine learning algorithm and train it to detect features and also identify misses and false positives. The haartraining algorithm does just this. It creates a series of haarclassifiers which ensure that non-features are quickly rejected as the object is identified.
This post will highlight the necessary steps required to build a haarclassifier for detection a hand or any object of interest. This post is sequel to my earlier post (OpenCV: Haartraining and all that jazz!) and has a lot more detail. In order to train the haarclassifier, it is suggested, that at least 1000 positive samples (images with the object of interest- hand in this case) and 2000 negative samples (any other image) is required.
As before for performing haartraining the following 3 steps have to be performed
1) Create samples (createsamples.cpp)
2) Haar Training (haartraining.cpp)
3) Performance testing of the classifier (performance.cpp)
In order to build the above 3 utilities please refer to my earlier post OpenCV: Haartraining and all that jazz!
The steps required for training a haarcascade to recognize on normal open palm is given below
Create Samples: This step can be further broken down into the following 3 steps
a) Creation of positive samples
b) Superimposing the positive sample on the negative sample
c) Merging of vector files of samples.
A) Creation of positive samples:
a)Get a series of images with objects of interest (positive samples):For this step take photos using the webcam (or camera) of the objects that you are interested. I had taken several snapshots of my hand ( I later simply downloaded images from Google images for because of the excessive clutter in the snaps taken).
b) Crop Images:In this step you need to crop the images such that it only contains the object of interest. You can use any photo editing tool of your choice and save all the positive images in a directory for e.g. ./hands
c) Mark the object:Now the image with the positive sample has to be marked. This will be used for creating the samples.
The tool that is to be used for marking is objectmarker.cpp. You can downloaded the source code for this from Achu Wilson’s blog Now build objectmarker.cpp with the include directories and libraries of OpenCV. Once you have successfully built objectmarker you are ready to mark the positive samples. Samples have to be marked because the description file for the positive samples file must be in the following format
[filename] [# of objects] [[x y width height] [... 2nd object] …]
This will be used for generating the positive training samples. This file is will be used with the createsamples utility to create positive training samples.
The command to use to run objectmarker is as follows
$ objectmarker <output file> <dir>
$objectmarker pos_desc.txt ./images
where dir is the directory containing the positive images
and output file will contain the positions of the objects marked as follows
/images/hand1.bmp 1 0 0 246 50
/images/hand2.bmp 1 187 26 333 400
The use of the utility objectmarker is an art and you need to be trained to get the proper data. Make sure you get sensible widths and heights. There are times when you get -ve widths and -ve heights which are clearly wrong.
An alternative easy way is to open the jpg/bmp in a photo editor and check the width and height in pixels (188 x 200 say) and create the description file as
/images/hand1.bmp 1 0 0 188 200
Ideally the objectmarker should give you values close to this if the sample image file has only the object of interest (hand)
Create positive training samples
The createsamples utility can now be used for creating positive training sample files. The command is
createsamples -info pos_desc.txt -vec pos.vec -w 20 -h 20
This will create positive training samples with the object of interest, in our case the “hand”. Now, you should verify that the tool has done something sensible by checking the training samples generated. The command to use to display the positive training samples captured in the pos.vec is to run
createsamples -vec pos.vec -w 20 -h 20
If the objects have been marked accurately you should see a series of hands (positive samples). If the samples have not been marked correctly the positive training file will give incorrect results.
Create negative background training samples
This step is used to create training samples with one positive image superimposed against a set of negative background samples. The command to use is
createsamples -img ./image/hand_1.BMP -num 9 -bg bg.txt -vec neg1.vec -maxxangle 0.6 -maxyangle 0 -maxzangle 0.3 -maxidev 100 -bgcolor 0 -bgthresh 0 -w 20 -h 20
where bg.txt will contain the list of negative samples in the following format
The positive hand images will be superimposed against the negative background in various angles of distortion.
All the negative training samples will be collected in the training file neg1.vec as above. As before you can verify that the createsamples utility has done something reasonable by executing
createsamples -vec neg1.vec -w 20 -h 20.
This should show a series of images of hands in various angles of distortion against the negative image background.
Creating several negative training samples with all positive samples
The createsamples utility takes one positive sample and superimposes it against the negative samples in bg.txt file. However we need to repeat this process for each of the positive sample (hand) that we have. Since this can be laborious process you can use the following shell script, I wrote, to repeat the negative training sample with every positive sample with create_neg_training.sh
for i in `cat hands.txt`
createsamples -img ./image/$i -num 9 -bg bg.txt -vec “neg_training$j.vec” -maxxangle 0.6 -maxyangle 0 -maxzangle 0.3 -maxidev 100 -bgcolor 0 -bgthresh 0 -w 20 -h 20
where the positive samples are under ./image directory. For each positive hand sample a negative training sample file is create namely neg_training1.vec, neg_training2.vec etc.
As seen above the createsamples utility superimposes only positive sample (hand) against a series of negative samples. Since we would like to repeat the utility for multiple positive samples there needs to be a way for merging all the training samples. A useful utility (mergevec.cpp) has been created and is available for download in Natoshi Seo’s blog
Download mergevec.cpp and build it along with (cvboost.cpp, cvhaarclassifier.cpp, cvhaartarining.cpp, cvsamples.cpp, cvcommon.cpp) with usual include files and the link libraries.
Once built it can be executed by executing
mergevec pos_neg.txt pos_neg.vec – w 20 -h 20
where pos_neg.txt will contain both the positive and negative training sample files as follows
As before you can verify that the entire training file pos_neg.vec is sensible by executing
createsamples -vec pos_neg.vec -w 20 -h 20
Now the pos_neg.vec will contain all the training samples that are required for the haartraining process.
The haartraining can be run with the training samples generated from the mergevec utility described above.
The command is
./haartrainer -data haarcascade -vec pos_neg.vec -bg bg.txt -nstages 20 -nsplits 2 -minhitrate 0.999 -maxfalsealarm 0.5 -npos 7 -nneg 9 -w 20 -h 20 -nonsym -mem 512 -mode ALL
Several posts have suggested that nstages should be ideally 20 and splits should be 2.
npos indicates the number of positive samples and -nneg the number of negative samples. In my case I had just used 7 positive and 9 negative samples. This step is extremely CPU intensive and can take several hours/days to complete. I had reduced the number of stages to 14. The haartraining utility will create a haarcascade directory and a haarcascade.xml.
Performance testing : The first way to test the integrity of the haarcascades is to run the performance utility described in my earlier post OpenCV:Haartraining and all that jazz! If you want to use the performance utility you should also create test samples which can be used for testing with the command
createsamples -img ./image/hand-1.bmp -num 10 -bg bg.txt -info test.dat -maxxangle 0.6 -maxyangle 0 -maxzangle 0.3 -maxidev 100 -bgcolor 0 -bgthresh 0
The test.dat can be used with the performance utility as follows
./performance -data haarcascade.xml -info test.dat -w 20 -h 20
I wanted something that would be more visually satisfying that seeing the output of the performance utility. This utility just spews out textual information of hits, misses.
So I decided to use the facedetect.c with the haarcascade trained by my positive and negative samples to check whether it was working. The test below describes using the facedetect.c for detecting the hand.
The code for facedetect.c can be downloaded from Willow Garage Wiki.
Compile and link the facedetect.c as handetect with the usual suspects. Once this builds successfully you can use
handdetect –-cascade=haarcascade.xml hands.jpg
where hands.jpg was my test image. If the handdetect does indeed detect the hand in the image it will l enclose the object detected with a rectangle. See the output below
While my haarcascade does detect 5 hands it does appear shifted. This could be partly because the number of positive and negative training samples used were 7 & 9 respectively which is very low. Also I used only 14 stages in the haartraining. As mentioned in the beginning there is a need for at least 1000 positive and 2000 negative samples which has to be used. Also haartraining should have 20 stages and 2 nsplits. Hopefully if you follow this you would have developed a fairly true haarcascade.
If you are adventurous enough you could run the above with the webcam as
handdetect –cascade=haarcascade.xml 0
where 0 indicates the webcam. Assuming that your haarcascade is perfect you should be able to track your hand in real time.
From my blog : Giga thoughts
Disclaimer: This does not represent IBM's views or strategies