User:Qymwill/Draft of the page for C-1SVM algorithm

In machine learning, competing one class support vector machines (C-1SVMs) are supervised learning models designed for binary and multiclass classification. In the algorithm, multiple one class support vector machines are trained, and different one-class support vector machines model the distributions of samples from different classes and hence may compete with each other. C-1SVM is not a binary SVM, not a one class SVM either. It is a set of competing one-class SVMs. For binary classification, two one-class support vector machines are trained and maintained.

Given a set of training examples and their corresponding labels, a C-1SVM training algorithm builds several C-1SVM models that can cooperatively label new examples. Based on the idea of maximum-margin hyperplane used in the binary SVM, a C-1SVM model tries to find a hyperplane which can enclose as many target class examples as possible. New examples are predicted to belong to a category based on which hyperplane they fall in.

By combining one-class support vector machine and on-line learning, C-1SVM was firstly introduced by Minglun Gong and Li Cheng as a support vector method for foreground segmentation of live videos[1] in computer vision and image processing.

Algorithm

edit

C-1SVM algorithm mainly involves two parts: training and classification. C-1SVM training is a kind of iterative process: given the one-class SVM models that are partially trained using received examples, when the model receives a new example with a label, the example is used to update the one-class SVM model of the corresponding class. The process stops until there are no changes in those one-class SVM models. C-1SVM classification is achieved by defining a score function for each class, and a new example belongs to the class with the maximum score.

One Class Support Vector Machine

edit

The one-class SVM is proposed by Bernhard Sch¨olkopf, John C. Platt, John Shawe-Taylor and Alex J. Smola.[2] It is an unsupervised learning method that learns a decision function for novelty detection: classifying new data as similar or different to the training set. C-1SVM algorithm trian a one-class SVM for each class. Each one-class SVM model trained by C-1SVM algorithm can determine if new examples belong to its corresponding class or not. Finally, the trained one-class SVM models make classifications cooperatively.

On-line Learning

edit

Binary SVM training using a large set of examples is a classical offline learning because the quadratic objective function that SVM aims to minimize would not change once the training process begins. Previous research[3] have shown that a similar solution can be obtained by following on-line learning scheme, in which training examples are repetitively inputted into an on-line learner following a repeated sequence. C-1SVM uses on-line learning because it can produce a partially trained C-1SVM model at each time point during the training process, and the partially trained model can be used by the new received example at each time point. These partially trained models are gradually refined towards the final solution. Because of the quicker feedback provided by partially trained C-1SVMs in the loop, on-line learning can help C-1SVM training achieve faster convergence than offline learning.

The online learner in C-1SVM training algorithm follows the one proposed by Li Cheng, S.V. N. Vishwanathan, Dale Schuurmans, Shaojun Wang, Terry Caelli.[4] Denote training examples in a time sequence as   and their corresponding labels as  ,   is the number of training examples. Let   be a score function of example   and   be the   support vector of classes. The kernel function used in on-line learning is denoted as  . When a new labeled example   is received, the score function is defined as

 

The weight of   is updated as

 

where   is the margin that is often set to  ,   is the cut-off value which is used to constrain the weight of  , and   is an indicator function:   and  . If  ,   is added into the corresponding support vector set, else if   is already in the support vector set, then erase it from the set.

Three common kernel functions include:

  • Gaussian kernel:  , where   is the variance.

When receiving an unlabeled example  , C-1SVM label it as the class with the maximum score that is computed using the score function defined above.

Example

edit

Take the binary classification for example, and there are two samples in each class. Denote them as   and  . Input the samples in the sequence:  . Both of the support vector sets are initially empty. The cut-off  , margin   and Gaussian function with standard deviation   is selected as the kernel function in the example. The algorithm can work as follows:

Step 1: receive  , since the support vector set of the first class is empty, the score function  , then  , and the support vector set   becomes  .
Step 2: receive  , compute  , then  , and the support vector set   becomes  .
Step 3: receive  ,   and  , the support vector set   becomes  .
Step 4: receive  , compute  , then  , and the support vector set   becomes  .

Following the above four steps, input samples in the same sequence and repeat the process until the two support vector sets do not change.

For classification, when an unlabeled sample   arrives, follow the defined score function, compute   using trained support vectors in   and   using trained support vectors in  . If  , then label   as  , else label it as  .

Pseudocode

edit

The algorithm includes two parts: training and classification.

Training

edit

Input: training examples   and labels  , total class number  

Output: support vectors of each class and corresponding weights

1  procedure C-1SVMtraining( )
2  for  
3      allocate a set   to store support vectors for the   class
4  while there are changes for support vector sets do
5     for  
6         receive an example   and its label  
7         compute score  
8         update weight  
9         if there is a vector in   equals to  
10           if  
11              delete   from  
12        else
13           if  
14              push   into  
15 return  

Classification

edit

Input: trained support vector sets  , new unlabeled example  , total class number  

Output: label  

1 procedure C-1SVMclassification( )
2 for  
3      
4  

Support vector sets of classes can be implemented by first in first out queue.

Binary SVMs vs C-1SVMs

edit
 
Fig. 1. Comparison between a binary SVM and C-1SVMs. White dots represent the training examples of the first class and black dots represent the training examples of the second class, while red dot denotes an unlabeled example.

For binary classification, binary SVM algorithm only train one SVM model giving one hyperplane. However, C-1SVM algorithm totally train two one-class SVMs. By doing so, two hyperplanes enclosing the training examples more tightly are returned, which can help toward better detecting and handling of ambiguous cases.

A detailed comparison is given in Fig.1. Denote while dots as the first class and black dots as the second class. The line between white dots and black dots represents the hyperplane trained by binary SVM. The circles enclosing white dots or black dots are hyperplanes trained by C-1SVMs. In case (a), binary SVM classifies the red dot as the first class, whereas C-1SVMs classifies it as unknown because neither of one-class SVM hyperplanes include it as inlier. In case (b), because the distance between the red dot and the hyperline of binary SVM is too small, binary SVM does not have high confidence to label it, whereas C-1SVMs can correctly label it as the second class.

Complexity

edit

Training: denote the total number of the optimal support vectors as  . Considering the worst case performance, we only add or delete one support vector in the loop of line 5 in the procedure C-1SVMtraining. Hence,   loops are required to convergence at most. Further, for each loop,   examples are received. Finally the running time could be  . If all of the examples are support vectors in the worst case,   space is required to store them.

Classification: to make classification, scores of all classes are computed. The running time and required space is both  .

Applications

edit

Most classification problems in machine learning can be solved by C-1SVMs. Similar to the SVM, C-1SVMs can also be applied in Computer Vision, Data Mining, Speech Recognition. For example, Gong and Cheng use C-1SVM algorithm to solve foreground segmentation of live videos[1], a problem in Computer Vision. They trained two one-class SVMs, one is for foreground pixels, and another one is for background pixels. When a new unlabeled pixel arrives, label it as foreground pixel or background pixel cooperatively using foreground and background one-class SVM.

References

edit
  1. ^ a b "Foreground segmentation of live videos using locally competing 1SVMs" (PDF). IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2011. {{cite journal}}: Cite uses deprecated parameter |authors= (help)
  2. ^ "Estimating the support of a high-demensional distribution". Neural Computation, Vol. 13: 1443–1472. 2001. {{cite journal}}: Cite uses deprecated parameter |authors= (help)
  3. ^ "The Tradeoffs of Large Scale Learning". Advances in Neural Information Processing Systems. 2008. {{cite journal}}: Cite uses deprecated parameter |authors= (help)
  4. ^ "Implicit Online Learning with Kernels". Advances in Neural Information Processing Systems. 2007. {{cite journal}}: Cite uses deprecated parameter |authors= (help)

Category:Machine learning Category:Support vector machines Category:Classification algorithms