Tutorial

Bags of visual words is derived from well known algortihm in document classification that called Bags of words. It plays the role of dictionary. some keypoints are defined from training phase and compose bags of words. it is as easy as performing Vector Quantization on feature space. Number of centroids is a number of words in dictionary. when new keypoint is extracted from an input image it will be assigned to a nearest keypoint in dictionary, so an output of this stage is histogram of assigned keypoints in an input image to nearest keypoints in the dictionary.
any type of classifiers such as SVM, Naive Bays calssifier and ... can be trained using the histograms that are gained from previous stage. I recommend you to look at the lecture which has been presented by Dr. Feifei Li(Generative Models for Visual Objects and Object Recognition via Bayesian Inference).

The Whole procedure is as follows

Training phase

I provide a simple code to explain how to use OpenCV's BOW functions. 4 classes of clatech-110 dataset is used. SURF extractor and descriptor is used for a feature extraction phase. In this implementation, Support vector machine with Radial Basis kernel is used. After optimization, the C and G parameter for this kernel are as follow:
C = 312.5�
G = 0.50625
I put comments in the code to make it self-explanatory. However, The code composed main parts such as:
�

void collectclasscentroids(); // is a function that extract features from training images
svm.train(trainingData,labels,cv::Mat(),cv::Mat(),params); // Training SVM
svm.predict(bowDescriptor); // predict a class of a new input image