Multi-class object detection and recognition

Object detection was and is a hotly debatable topic in machine vision. In single class object detection you feel free to choose best features for a particular object and set your classifier to gain the best accuracy, but for the multi-class object detection, challenge leaps to a new stage. In many cases, more than one class may be eligible for a particular region of an image. In these cases one solution is to design a feedback network and use bottom-up and top-down approach to make final decision about the class of these regions.

My research is focused on three main problems:
1- Feature extraction, selection and sampling
2- Extracting semantic information
3- Maintain the object detector accuracy as a number of classes increases
My framework for Multi-class object detection and recognition:





In the following figure, some images of MSRC21, ground-truth and the current results of my work are shown.
Mohammadreza Mostajabi
Smart traffic control under adverse conditions
This project is composed of 3 main parts:
1- Detecting every car
2- Detecting the speed of every car
3- Calculating average speed through series of cameras in particular highway or street

The most important thing is a reliability of system under adverse conditions on a low resolution and noisy video.

Algorithm
We use contour based detector as an initial detection, then we use Camshift tracker to track detected cars. Because of noisy and low quality images, it is imposible to use object detector through out the procedure, so after an initial detection, tracking must be performed. Even with a help of tracking an error rate is still high due to a noisy image. To this end, feedback network must be formed to reduce an error rate.



Canny edge detector & Morphological dilation
low quality image
Feedback network
Tracking
Contour Based Object Detector
Final decision
Copyright 2011 Mohammadreza Mostajabi

Scene classification using Bags of Visual words + SVM classifier

Abstract




Bags of visual words is derived from well known algortihm in document classification that called Bags of words. It plays the role of dictionary. some keypoints are defined from training phase and compse bags of words. it is as easy as performing Vector Quantization on feature space. Number of centroids is a number of words in dictionary. when new keypoint is extracted from an input image it will be assigned to a nearest keypoint in dictionary, so an output of this stage is histogram of assigned keypoints in an input image to nearest keypoints in the dictionary.
any type of classifiers such as SVM, Naive Bays calssifier and ... can be trained using the histograms that are gained from previous stage. I recommend you to look at the lecture which has been presented by Dr. Feifei Li(
Generative Models for Visual Objects and Object Recognition via Bayesian Inference).

The Whole precedure is as follows

Training phase





Vector quantization
Dictionary
Recognition
Dictionary of visual codewords
I provide a simple code to explain how to use OpenCV's BOW functions. 4 classes of clatech-110 dataset is used. SURF extractor and descriptor is used for a feature extraction phase. In this implementation, Support vector machine with Radial Basis kernel is used. After optimization, the C and G parameter for this kernel are as follow:
C = 312.5 
G = 0.50625
I put comments in the code to make it self-explanatory. However, The code composed main parts such as:
 

Above method gives you 85% accuracy on evaluating 120 images from 4 different classes such as Tiger, Airplane, Bike and side view car.
void collectclasscentroids(); // is a function that extract features from training images
svm.train(trainingData,labels,cv::Mat(),cv::Mat(),params); // Training SVM
svm.predict(bowDescriptor); // predict a class of a new input image
References
G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, "Visual categorization with bags of keypoints," in In Workshop on Statistical Learning in Computer Vision, ECCV, 2004, pp. 1-22.
Codes and details will be available as soon as I publish my works.
References
1- M. Mostajabi, I. Gholampour, "A Framework Based on the Affine Invariant Regions for Improving Unsupervised Image Segmentation", Information Sciences Signal Processing and their Applications (ISSPA), 11th International Conference on, July 2012, Montreal, Canada, pp. 17-22.

2- M. Mostajabi, I. Gholampour, "
Directional Differences: a texture feature set for multi-class image segmentation", Submitting to International Journal of Pattern Recognition and Artificial Intelligence.