INTEREST POINTS BASED OBJECT TRACKING VIA SPARSE REPRESENTATION

Page created by Kevin Campos

Lifestyle

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

INTEREST POINTS BASED OBJECT TRACKING VIA SPARSE REPRESENTATION

R. Venkatesh Babu Priti Parate

Video Analytics Lab, SERC Video Analytics Lab, SERC
Indian Institute of Science Indian Institute of Science
Bangalore, India Bangalore, India

ABSTRACT like points are more suitable for reliable tracking due to its stability
In this paper, we propose an interest point based object tracker in and robustness to various distortions like rotation, scaling, illumina-
sparse representation (SR) framework. In the past couple of years, tion etc. Though the interest-point based trackers show more robust-
there have been many proposals for object tracking in sparse frame- ness to various factors like rotation, scale and partial occlusion, the
work exhibiting robust performance in various challenging scenar- major issues surface from description and matching of interest points
ios. One of the major issues with these SR trackers is its slow ex- between the successive frames. For example, Kloihofer et al. [10]
ecution speed mainly attributed to the particle filter framework. In use SURF [11] descriptors of interest points as feature descriptors.
this paper, we propose a robust interest point based tracker in l1 min- The object is tracked by matching the object points of the previous
imization framework that runs at real-time with better performance frame with candidate points in the current frame. The displacement
compared to the state of the art trackers. In the proposed tracker, vectors of these points are utilized for localizing the object in the
the target dictionary is obtained from the patches around target in- current frame [12]. Matching the features of interest points between
terest points. Next, the interest points from the candidate window two frames is a crucial step in estimating the correct motion of the
of the current frame are obtained. The correspondence between tar- object and typically Euclidian distance is used for matching. The
get and candidate points are obtained via solving the proposed l1 recently proposed vision algorithms in sparse representation frame-
minimization problem. A robust matching criterion is proposed to work clearly illustrates its superior discriminative ability at very low
prune the noisy matches. The object is localized by measuring the dimension even among similar feature patterns [13, 14]. This moti-
displacement of these interest points. The reliable candidate patches vated us to examine the matching ability of sparse representation for
are used for updating the target dictionary. The performance of the interest points.
proposed tracker is bench marked with several complex video se- In this paper, we have proposed a robust interest point based
quences and found to be fast and robust compared to reported state tracker in sparse representation framework. The interest points of
of the art trackers. the object are obtained from the initial frame by Harris corner detec-
tor [15] and a dictionary is constructed from the small image patch
Index Terms— Visual Tracking, l1 minimization, Interest
surrounding these corner points. The candidate corner points ob-
points, Harris corner, Sparse Representation.
tained from the search window of the current frame is matched with
object points (dictionary) by sparsely representing the candidate cor-
1. INTRODUCTION ner patches in terms of dictionary patches. The correspondence be-
tween the target and candidate interest points are established via the
Visual tracking has been one of the key research areas in computer maximum value of the sparse coefficients. A ‘robust matching’ cri-
vision community for the past few decades. Tracking is a crucial teria has been proposed for pruning the noisy matches by checking
module for video analysis, surveillance and monitoring, human the mutual match between candidate and target patches. The ob-
behavior analysis, human computer interaction and video index- ject is localized based on the displacement of these matched interest
ing/retrieval. The major challenges arise in real life scenario are points. The matched candidate patches are used for updating the tar-
due to intrinsic and extrinsic factors. The intrinsic factors include get dictionary. Since the dictionary elements are obtained from a
pose, appearance and scale changes. The extrinsic factors are: very small patch surrounding these corner points, the proposed ap-
illumination, occlusion and clutter. proach is robust and computationally very efficient compared to the
There has been several proposals for object tracking in the particle filter based l1 trackers [14, 17, 18, 19, 20].
past. Based on object modeling, the majority of the trackers can The rest of the paper is organized as follows: Section 2 briefly
be brought under the following two main themes: i) Global object reviews related works. Section 3 explains the proposed tracker in
model and ii) Local - object model. In global approach, the ob- sparse representation framework. Section 4 discusses the results and
ject is typically modeled using all the pixels corresponding to the concluding remarks are given in section 5.
object region or some global property of the object. The simple
template based SSD (sum of the squared distance) tracker, color
histogram based meanshift trackers [1, 2] and the machine learning 2. SPARSE REPRESENTATION BASED TRACKING
based trackers [3, 4] are examples of global trackers. Traditional
Lucas-Kanade tracker [5] and many bag-of-words model trackers The concept of sparse representation recently attracted the computer
are some examples for interest point tracker. A detailed survey on vision community due to its discriminative ability [13]. Sparse repre-
various object tracking methods can be found in [6, 7, 8]. sentation has been applied to various computer vision tasks including
The proposed interest point based tracker falls under the local face recognition [13], image video restoration [21], image denois-
object model based tracking. Shi et al. showed [9] that the corner- ing [22], action recognition [23], super resolution [24], and tracking

Initial Frame For Each Frame points are often used in various computer vision applications includ-
ing visual tracking.
In this paper, we propose a tracker that utilizes the flexibility of
Create Target Create candidate the interest points and robustness of sparse representation for match-
Dictionary (T) Dictionary (Y)
ing these interest points across frames. The proposed tracker rep-
resents the object as a set of interest points. The interest points are
represented by a small image patch surrounding it. The vectorized
image patches are l2 normalized to form the atoms of the target dic-
Match Target to
tionary. In order to localize the object in the current frame, the in-
Candidate (T->Y)
terest points are obtained from the predefined search window in the
current frame. Each target interest point (object dictionary element)
Update Dict T with is represented as a sparse linear combination of the candidate dic-
Select Matched
Robustly matched tionary atoms. The discriminative property of sparse representation
Candidate
Candidates patches
enables matching between target and candidate patches robustly. For
each target patch, the candidate patch whose corresponding coeffi-
cient is maximum, is chosen as the match. The noisy matches are
pruned via the proposed ‘robust matching’ criteria by checking the
Estimate Object Match Candidate
Motion
mutual match between candidate and target patches. The mutual
to Target (Y->T)
correspondence is confirmed by sparsely representing the candidate
interest points in terms of target interest points. Only the points that
mutually agree with each other were considered for object localiza-
Localize Object tion. The object location in the current frame is estimated as the
median displacement of candidate points with respect to it’s match-
Fig. 1. Proposed tracker overview. ing target points. The overview of proposed tracker is shown in Fig
1. Since the proposed tracker is a point tracker, it shows robustness
to global appearance, scale and illumination changes. Further, the
[14]. proposed tracker uses only a small patch around the interest points
The l1 minimization tracker proposed by Mei et al. [14] uses the as dictionary elements, making it compact and computationally very
low resolution target image along with trivial templates as dictionary efficient.
elements. The candidate patches can be represented as a sparse linear
combination of the dictionary elements. To localize the object in the
3.1. Sparse Representation
future frames, the authors use particle filter framework. Here, each
particle is an image patch obtained from the spatial neighborhood Let T = {ti } be the set of target patches corresponding to the target
of previous object center. The particle that minimizes the projection interest points, represented as l2 normalized vectors. Now, the can-
error indicates the location of object in the current frame. Typically, didate interest points, represented as the vectorized image patches
hundreds of particles are used for localizing the object. The perfor- yi , obtained within a search window of the current frame could be
mance of the tracker relies on the number of particles used. This represented by a sparse linear combination of the target dictionary
tracker is computationally expensive and not suitable for real-time elements ti .
tracking. Faster version of the above work [16] was proposed by
Bao et al. [17], here the l2 norm regularization over trivial templates yi ≈ Tai = ai1 t1 + ai2 t2 + . . . + aik tk (1)
is added to the l1 minimization problem. However, our experiments
show that the speed of the above algorithm is achieved at the cost of where, {t1 , t2 , . . . , tk } are target bases and {ai1 , ai2 , . . . , aik } are
its tracking performance. Other faster version was proposed by Li et the corresponding target-coefficient vectors. The coefficient aij in-
al. [25]. Jia et al. [26] used structural local sparse appearance model dicates the affinity between i-th candidate and the j-th target interest
with an alignment-pooling method for tracking the object. All these point based on the discriminative property of sparse representation.
sparse trackers are implemented in particle filter framework and runs The coefficient vector ai = {ai1 , ai2 , . . . aik } is obtained as a
at low frame rate, hence not suitable for realtime tracking. A recent solution of
real-time tracker proposed by Zhang et al. [27] uses a sparse mea-
surement matrix to reduce the dimension of foreground and back- min kai k1 s.t. kyi − Tai k22 ≤ λ (2)
ai ∈Rk
ground sample and uses naive Bayes classifier for classifying object
and background. This tracker also shows poor performance for com-
where, λ is the reconstruction error tolerance in mean square
plex videos as shown in our experiments.
sense. Considering the effect of occlusion and noise, (1) can be writ-
ten as,
3. PROPOSED TRACKER yi = Tai + ² (3)

Feature detection and matching has been an essential component of The nonzero entries of the error vector ² indicate noisy or occluded
computer vision for various applications such as image stitching and pixels in the candidate patch yi . Following the strategy in [13], we
3D model construction. The features used for such applications are use trivial templates I = {[i1 . . . in ] ∈ Rn×n } to capture the noise
typically obtained from specific locations of image known as inter- and occlusion.
est points. The interest points (often called as corner points) indicate
the salient locations of the image, which is least affected by various yi = Tai + Ici (4)
deformations and noise. The features extracted from these interest = ai1 t1 + . . . + aik tk + ci1 i1 . . . + cin in

where, each of the trivial templates {i1 . . . in }, is a vector having    the match between i-th candidate point and j-th target point. The i-
nonzero entry only at one location and {ci1 . . . cin } is the corre-      th candidate point is selected if:
sponding coefficient vector. A non zero trivial coefficient indicates
the reconstruction error at the corresponding pixel location, possibly                     max{bjr }      =     bji ,   r ∈ {1 . . . l}             (6)
                                                                                             r
due to noise or occlusion.
                                                                                           max{air }      =     aij ,   r ∈ {1 . . . k}
                                                                                             r

3.2. Object Localization
                                                                               All the candidate points that agree with the above robust match-
                                                                           ing criteria were considered for object localization (all red points
                                                                           shown in Fig. 2(b)). Let there be k2 (k2 ≤ k1 ) such candidate
                                                                           points located at [xci , yic ], i ∈ {1 . . . k2 } with respect to object win-
                                                                           dow center (x, y). Let [xti , yit ], i ∈ {1 . . . k2 } be the corresponding
                                                                           locations of target points with respect to the same object window
                                                                           center (xo , yo ). Now the displacement of object is measured as me-
                                                                           dian displacement of candidate points with respect to it’s matching
                                                                           target points. Let dxi and dyi be the displacement of i-th candidate
                                                                           point measured as:
                     (a)                         (b)
                                                                                         dxi = xci − xti , i ∈ {1, 2, . . . , k2 }                  (7)
Fig. 2. Robust matching of interest points. (a) Target window with
interest points (b) Candidate window with interest points. The red                       dyi = yic − yit , i ∈ {1, 2, . . . , k2 }
patches are the mutually matched points, green patches matched
only one way (either target to candidate or candidate to target) and          The object location in the current frame is obtained using the
the blue patches are unmatched ones.                                       median displacement of each coordinate.

     The object is tracked by finding the correspondence between the                xo = xo + median(dxi ), i ∈ {1, 2, . . . , k2 }                 (8)
target and candidate interest points in successive frames. Only the                 yo = yo + median(dyi ), i ∈ {1, 2, . . . , k2 }
reliable candidate interest points that mutually match with each other
are considered for estimating the object location. Let l be the number
                                                                           3.4. Dictionary Update
of interest points in the current frame within the search window (all
points shown in Fig 2(b)). Only k (k < l) interest points out of l         Since the object appearance and illumination changes over time, it
points that match with the dictionary atoms will be considered for         is necessary to update the target dictionary for reliable tracking. We
object localization (all green+red points shown in Fig. 2(b)).             deploy a simple update strategy based on the confidence of matching.
     In order to find the candidate points that match with the target      The top 10 % of matched candidate points are added to the dictionary
points, each target interest point tj is represented as a sparse linear    if the matched sparse coefficient is above certain threshold. Same
combination of candidate points (yi ) and trivial bases.                   number of unmatched points from the target dictionary are removed
                                                                           to accommodate these new candidate points.
        tj    =    Y.bj + ²                                          (5)
              =    bj1 y1 + . . . + bjl yl + cj1 i1 . . . + cjn in         Algorithm 1 Proposed Tracking
Utilizing the discriminative property of sparse representation, the         1: Input: Initialize object window centered at [xo , yo ].
matching candidate point for the given target point tj is determined        2: Obtain the interest points located within object window using
by the maximum value of the candidate dictionary coefficient (bj ).            Harris corner detector.
Let the coefficient bji corresponding to the candidate dictionary ele-      3: Create Target Dictionary ti from each object interest point lo-
ment yi be the maximum value. Then the matching candidate point                cated at [xti , yit ] with respect to [xo , yo ].
for the target point tj is declared as yi . Mathematically, bji =           4: repeat
max{bjr }, r ∈ {1, 2, . . . , l}, indicates the match between j-th tar-     5:    Obtain the candidate interest points from the current frame,
 r                                                                                within a search window. Select image patch yi centered
get point and i-th candidate point. All the matched candidate points
                                                                                  around each interest point located at [xci , yic ] with respect to
corresponding to each target dictionary element is shown by green
                                                                                  [xo , yo ].
and red patches in Fig. 2(b). If two or more target points match
                                                                            6:    Obtain the matching candidate points corresponding to target
with same candidate point, only the candidate patch with higher co-
                                                                                  points according to (5)
efficient value will be considered. Hence, the number of candidate
                                                                            7:    Robust Matching: Select only those candidate points that mu-
matching points is less than or equal to the number of target dictio-
                                                                                  tually match with the target points as in (6).
nary elements.
                                                                            8:    Estimate the displacement of the object as the median dis-
                                                                                  placement of each coordinate of mutually matched candidate
3.3. Robust Matching                                                              points as in (7).
                                                                            9:    Localize the object by adding this displacement to current ob-
Let k1 (k1 ≤ k) be the number of matched candidate interest points
                                                                                  ject location as in (8)
(all red and green points shown in Fig. 2(b)). In order to prune
                                                                           10:    Update the target dictionary
the noisy matches, now each of these candidate interest point (yi ) is
                                                                           11: until Final frame of the video.
represented as a sparse linear combination of target points (tj ) and
trivial bases as in (5). Let aij = max{air }, r ∈ {1 . . . k} indicates
                                     r

4. RESULTS AND DISCUSSION                                                            100
                                                                                                                                               L1
                                                                                                                                                                                    200
                                                                                                                                                                                                                                 L1
                                                                                                                                               CT                                                                                CT

                                                                           Position Error (pixels)

                                                                                                                                                          Position Error (pixels)
                                                                                                      80                                       Prop                                                                              Prop
                                                                                                                                                                                    150
The proposed tracker is implemented in MATLAB and experimented                                                                                 frag
                                                                                                                                               ms
                                                                                                                                                                                                                                 frag
                                                                                                                                                                                                                                 ms
                                                                                                      60
on publicly available complex video sequences with various chal-                                                                               sgm
                                                                                                                                               apg                                  100
                                                                                                                                                                                                                                 sgm
                                                                                                                                                                                                                                 apg
lenging scenarios such as illumination change, partial occlusion and                                  40

appearance change. These videos were shot with different sensors                                      20
                                                                                                                                                                                     50

such as color, gray-scale, infra-red and were recorded in indoor and                                   0                                                                              0
outdoor environments. To solve l1 minimization problem we have                                          0   50 100 150 200 250 300 350 400 450
                                                                                                                          Frame no.
                                                                                                                                                                                       0   100 200 300 400 500 600 700 800 900
                                                                                                                                                                                                        Frame no.
used the Sparse Modeling Software (SPAMS) package [28] with the                                                       (a)                                                                                    (b)
regularization constant λ set at 0.1 for all the experiments. For in-                                                                                                               250
terest point detection, we have used Harris corner detector obtained                                 250
                                                                                                                                               L1
                                                                                                                                                                                                                                 L1
                                                                                                                                                                                                                                 CT

                                                                                                                                                          Position Error (pixels)
from [29] with Nobel’s corner measure [30]. Since the number of                                                                                CT                                   200                                          Prop

                                                                           Position Error (pixels)
                                                                                                     200                                       Prop                                                                              frag
interest points are typically proportional to size of the object, we                                 150
                                                                                                                                               frag
                                                                                                                                               ms                                   150
                                                                                                                                                                                                                                 ms
                                                                                                                                                                                                                                 sgm
have set the parameters of Harris corner detector depending on the                                                                             sgm
                                                                                                                                               apg
                                                                                                                                                                                                                                 apg
                                                                                                                                                                                    100
size of the target. For all video sequences we have used σ = 1                                       100

and threshold = 1. For target size less than 50 × 50 we have used                                     50                                                                             50

radius = 0.5 and radius = 2 for size greater than 50 × 50. For                                         0                                                                              0
satisfactory tracking, we need at least K target interest points for an                                 0     200   400     600
                                                                                                                          Frame no.
                                                                                                                                      800   1000   1200                                0       100     200     300
                                                                                                                                                                                                        Frame no.
                                                                                                                                                                                                                           400      500

object. In our experiments we have observed that only 25 - 30 %                                                       (c)                                                                                    (d)
of the interest points satisfy our robust matching criteria. Based on
this observation we have set the value of K at 100. For some rea-                                    Fig. 4. Trajectory position error with respect to ground truth for:(a)
son, if the number of target interest points are too low or too high                                 pktest02 (b) panda (c) DudekSeq (d) car4
compared to K, the threshold and radius parameter can be adjusted
appropriately to get the desired number of interest points. For each
interest point, a 5 × 5 image patch centered around that point is used
for creating dictionary atoms. For evaluating the performance of                                                                       Table 1. Average RMSE
the proposed tracker, we have compared the results with l1 tracker                                     Video                   MS            FrgTrk       l1                                SGM        APG           CT            Prop.
                                                                                                       (# frames)              [1]           [31]         [14]                              [32]       [17]          [27]
proposed by Mei et al. [14], using 300 particles, meanshift tracker                                    pktest02(515)           46.63         114.3        19.74                             20.5       100.2         103.9         2.96
[1], fragment-based tracking (FragTrack) [31], SGM [32], APG [17]                                      panda(900)              9.62          11.97        28.88                             440.7      86.84         172.4         8.81
and compressive tracking (CT) [27]. We have shown results for the                                      Dudek(1145)             52.47         151.42       208.0                             95.13      22.90         95.44         20.11
                                                                                                       car4(431)               101.3         178.1        8.64                              13.1       9.84          134.8         6.66
following four different video sequences 1 : pktest02 (515 frames),
panda (900 frames), Dudek (1145 frames) and car4 (431 frames).
                                                                                                                                  Table 2. Execution time (secs)
                                                                                                       Video                   MS            FrgTrk       l1                               SGM        APG          CT            Prop
                                                                                                       (# frames)              [1]           [31]         [14]                             [32]       [17]         [27]
                                                                                                       pktest02(515)           0.039         0.23         0.33                             3.32       0.09         0.013         0.021
                                                                                                       panda(900)              0.041         0.32         0.35                             3.6        0.1          0.023         0.023
                                                                                                       Dudek(1145)             0.042         0.32         0.35                             3.6        0.1          0.021         0.023
                                                                                                       car4(431)               0.04          0.23         0.33                             3.3        0.09         0.017         0.022

                                                                                                     of these sequences are shown in Fig. 3. Tables 1 and 2 summarizes
                                                                                                     the performance of the trackers with respect to accuracy (pixel de-
                                                                                                     viation from ground truth) and execution time (in sec.). It can be
                                                                                                     observed that the proposed tracker achieves real time performance
                                                                                                     with better tracking accuracy compared to many recent state of the
                                                                                                     art trackers such as CT, SGM and APG. For all trackers, author’s
                                                                                                     original implementation has been used.

                                                                                                                                            5. CONCLUSION
Fig. 3. Result for pktest02, panda, dudek and car video sequences
(from top to bottom).                                                          The interest point based object modeling along with the proposed ro-
                                                                               bust matching criterion provides robustness to various challenges in-
     The tracking windows for all trackers at various time instances           cluding partial occlusion, illumination/appearance changes. The per-
   1 The original videos were obtained from the following links:
                                                                               formance of the proposed real-time tracker has been bench-marked
                                                                               with various publicly available complex video sequences. The quan-
pktest02: http://vision.cse.psu.edu/data/vividEval/datasets/PETS2005/PkTest02/
index.html
                                                                               titative evaluation of the proposed tracker is compared with the many
panda: http://info.ee.surrey.ac.uk/Personal/Z.Kalal/TLD/TLD dataset.ZIP        recent state of the art trackers including CT, SGM, APG and l1 track-
dudek: http://www.cs.toronto.edu/vis/projects/dudekfaceSequence.html           ers. The overall performance of the proposed tracker is found to be
car4: http://www.cs.toronto.edu/∼dross/ivt/                                    better in terms of both accuracy and speed.

6. REFERENCES                                   [17] C. Bao, Y. Wu, H. Ling, and H. Ji, “Real time robust l1 tracker
                                                                             using accelerated proximal gradient approach,” in IEEE Con-
 [1] D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object              ference on Computer Vision and Pattern Recognition, 2012.
     tracking,” IEEE Transactions on Pattern Analysis and Machine
                                                                        [18] Huaping Liu and Fuchun Sun, “Visual tracking using sparsity
     Intelligence, vol. 25, no. 5, pp. 564–577, May 2003.
                                                                             induced similarity,” in Proceedings of International Confer-
 [2] R. Venkatesh Babu, Patrick Perez, and Patrick Bouthemy, “Ro-            ence on Pattern Recognition, 2010.
     bust tracking with motion estimation and local kernel-based        [19] M. S. Naresh Kumar, P. Priti, and R. Venkatesh Babu,
     color modeling,” Image and Vision Computing, vol. 25, no. 8,            “Fragment-based real-time object tracking: A sparse represen-
     pp. 1205–1216, 2007.                                                    tation approach,” in Proceedings of the IEEE International
 [3] S. Avidan, “Ensemble tracking,” IEEE Transactions on Pattern            Conference on Image Processing, 2012, pp. 433–436.
     Analysis and Machine Intelligence, vol. 29, no. 2, pp. 261–271,    [20] Avinash Ruchandani and R. Venkatesh Babu, “Local appear-
     Feb. 2007.                                                              ance based robust tracking via sparse representation,” in Pro-
 [4] R. Venkatesh Babu, S. Suresh, and A. Makur, “Online adap-               ceedings of the Eighth Indian Conference on Computer Vision,
     tive radial basis function networks for robust object tracking,”        Graphics and Image Processing, 2012, pp. 12:1–12:8.
     Computer Vision and Image Understanding, vol. 114, no. 3,          [21] J. Mairal, G. Sapiro, and M. Elad, “Learning multiscale sparse
     pp. 297–310, 2010.                                                      representations for image and video restoration,” Multiscale
 [5] B. D. Lucas and T. Kanade, “An iterative image registration             Modeling Simulation, vol. 7, no. 1, pp. 214, 2008.
     technique with an application to stereo vision,” in Proceed-       [22] Michael Elad and Michal Aharon, “Image denoising via sparse
     ings of DARPA Image Understanding Workshop, Apr. 1981,                  and redundant representations over learned dictionaries,” IEEE
     pp. 121–130.                                                            Transactions on Image Processing, vol. 15, no. 12, pp. 3736–
 [6] Alper Yilmaz, Omar Javed, and Mubarak Shah, “Object track-              3745, 2006.
     ing: A survey,” ACM Computing Surveys, vol. 38, no. 4, 2006.       [23] K. Guo, P. Ishwar, and J. Konrad, “Action recognition using
 [7] A C Sankaranarayanan, A Veeraraghavan, and R Chellappa,                 sparse representation on covariance manifolds of optical flow,”
     “Object detection, tracking and recognition for multiple smart          in Proceedings IEEE International Conference on Advanced
     cameras,” Proceedings of the IEEE, vol. 96, no. 10, pp. 1606–           Video and Signal-Based Surveillance, Aug. 2010.
     1624, 2008.                                                        [24] J. Wright and T. Huang, “Image super-resolution as sparse
 [8] Irene Y.H. Gu and Zulfiqar H. Khan, “Chapter 5: Online learn-           representation of raw image patches,” in IEEE Conference on
     ing and robust visual tracking using local features and global          Computer Vision and Pattern Recognition, 2008, pp. 1–8.
     appearances of video objects,” in Object Tracking, Hanna           [25] Hanxi Li and Chunhua Shen, “Robust real-time visual track-
     Goszczynska, Ed., pp. 89–118. InTech, 2011.                             ing with compressed sensing,” in Proceedings of IEEE Inter-
 [9] Jianbo Shi and Carlo Tomasi, “Good features to track,” in               national Conference on Image Processing, 2010.
     IEEE Conference on Computer Vision and Pattern Recogni-            [26] Xu Jia, Huchuan Lu, and Ming-Hsuan Yang, “Visual track-
     tion, 1994, pp. 593–600.                                                ing via adaptive structural local sparse appearance model,” in
[10] W. Kloihofer and M. Kampel, “Interest point based track-                IEEE Conference on Computer Vision and Pattern Recogni-
     ing,” in Proceedings of the International Conference on Pat-            tion, 2012, pp. 1822–1829.
     tern Recognition, 2010, pp. 3549–3552.                             [27] Kaihua Zhang, Lei Zhang, and Ming-Hsuan Yang, “Real-time
[11] H. Bay, T. Tuytelaars, and L. V. Gool, “Surf: Speeded up                compressive tracking,” in In processdings of European Con-
     robust features,” in In processdings of European Conference             ference on Computer Vision, 2012.
     on Computer Vision, 2006, pp. 404–417.                             [28] WebLink,       “SPAMS:,”          http://www.di.ens.fr/willow/
                                                                             SPAMS/downloads.html.
[12] Jean-Yves Bouguet, “Pyramidal implementation of the lucas
     kanade feature tracker description of the algorithm,” Tech.        [29] P. D. Kovesi,      “MATLAB and Octave functions for
     Rep., Intel Corporation Microprocessor Research Labs, 2000.             computer vision and image processing,” Centre for Ex-
                                                                             ploration Targeting, School of Earth and Environment,
[13] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Ro-
                                                                             The University of Western Australia,     Available from:
     bust face recognition via sparse representation,” IEEE Trans-
                                                                             .
     actions on Pattern Analysis and Machine Intelligence, vol. 31,
     no. 2, pp. 210–227, Feb. 2009.                                     [30] Alison Noble, Descriptions of Image Surfaces, Ph.D. thesis,
                                                                             Oxford University, 1989.
[14] X. Mei and H. Ling, “Robust visual tracking and vehicle classi-
     fication via sparse representation,” IEEE Transactions on Pat-     [31] Amit Adam, Ehud Rivlin, and Ilan Shimshoni, “Robust
     tern Analysis and Machine Intelligence, vol. 33, no. 11, pp.            fragments-based tracking using the integral histogram,” in
     2259–2272, Nov. 2011.                                                   IEEE Conference on Computer Vision and Pattern Recogni-
                                                                             tion, 2006, pp. 798–805.
[15] C. Harris and M.J. Stephens, “A combined corner and edge
     detector,” in Alvey Vision Conference, 1988, pp. 147–152.          [32] Wei Zhong, Huchuan Lu, and Ming-Hsuan Yang, “Robust
                                                                             object tracking via sparsity-based collaborative model,” in
[16] Xue Mei, Haibin Ling, Yi Wu, Erik Blasch, and Li Bai, “Min-             IEEE Conference on Computer Vision and Pattern Recogni-
     imum error bounded efficient l1 tracker with occlusion detec-           tion, 2012, pp. 1838–1845.
     tion,” in Proceedings of IEEE Conference on Computer Vision
     and Pattern Recognition, 2011.