In computer vision and image processing, selective search is a method of object detection that takes an image and captures objects within that image into defined spaces called bounding boxes (defined by an origin point, height value, and length value). These boxes then act as Regions of Interest (ROI) for an image classifier to classify what object is present in the bounding box.[1]

The selective search method is an attempt to make object detection less computationally taxing then exhaustive search and capture the benefits of segmentation in establishing the boundary lines of the boxes based on the shape of the object being classified.

It avoids needing to make a lot of correct guesses on its parameters by using an ensemble approach to segmentation rather than a single model and it avoids having a fixed bounding box size, thus being able to detect objects at different scales, by combining segments to varying sizes rather than having a fixed sized grid overlaying the image from which to search from, as the computationally feasible exhaustive approaches have.

Uses

edit

It is the fundamental Region of Interest extractor for the original Region Based Convolutional Neural Network (R-CNN) as well as Fast R-CNN.

How it works

edit

The image is taken in and split up into an initial set of small starting regions by the fast method of Felzenszwalb and Huttenlocher.[2] A greedy algorithm is used to iteratively group regions together. The algorithm works by taking a given region and calculating the similaritiese between it and all neighbouring regions. The two most similar regions are grouped together. This occurs recursively until a single region spanning the image results. The Hierarchical Grouping Algorithm is as follows:

Input: (color) image
Output: Set of object location hypotheses L

Obtain initial regions R = {r1,...rn} using Fast Method (Felzenszwalb and Huttenlocher)
Initialize similarity set S = Ø
foreach Neighboring region pair (ri,rj) do
    Calculate similarity s(ri,rj)
    S=S U s(ri,rj)

while S != Ø do
    Get highest similarity s(ri,rj) = max(S)
    Merge corresponding regions rt = ri U rj
Remove similarities regarding ri : S = S \ s(ri,r*)
    Remove similarities regarding rj : S = S \ s(r*,rj)
    Calculate similarity set St, between rt and its neighbors
    S = S U St
    R = R U rt

Extract object location boxes L from all regions in R

Selective search relies on variety in its assembly of larger regions and it accomplishes this variety in 3 ways: using a variety of color spaces with different invariance properties, using different similarity measure sij, and varying the starting regions of calculation.

The similarity measure in the above algorithm s(ri,rj) is defined as:

a1Scolor(ri,rj)+a2Stexture(ri,rj)+a3Ssize(ri,rj)+a4Sfill(ri,rj)

ai ∈ {0,1}

References

edit
edit