Short-term visual object tracking in real-time

T. Vojir. Short-term visual object tracking in real-time. 5, 2017.

  • Tomas Vojir

In the thesis, we propose two novel short-term object tracking methods, the Flock of Trackers (FoT) and the Scale-Adaptive Mean-Shift (ASMS), a framework for fusion of multiple trackers and detector and contributions to the problem of tracker evaluation within the Visual Object Tracking (VOT) initiative.

The Flock of Trackers partitions the object of interest to an equally sized parts. For each part, the FoT computes an optical flow correspondence and estimates its reliability. Reliable correspondences are used to robustly estimates a target pose using RANSAC technique, which allows for range of complex rigid transformation (e.g. affine transformation) of a target. The scale-adaptive mean-shift tracker is a gradient optimization method that iteratively moves a search window to the position which minimizes a distance of a appearance model extracted from the search window to the target model. The ASMS propose a theoretically justified modification of the mean-shift framework that addresses one of the drawbacks of the mean-shift trackers which is the fixed size search window, i.e. target scale. Moreover, the ASMS introduce a technique that incorporates a background information into the gradient optimization to reduce tracker failures in presence of background clutter.

To take advantage of strengths of the previous methods, we introduce a novel tracking framework HMMTxD that fuses multiple tracking methods together with a proposed feature-based online detector. The framework utilizes a hidden Markov model (HMM) to learn online how well each tracking method performs using sparsely ”annotated” data provided by a detector, which are assumed to be correct, and confidence provided by the trackers. The HMM estimates the probability that a tracker is correct in the current frame given the previously learned HMM model and the current tracker confidence. This tracker fusion alleviates the drawbacks of the individual tracking methods since the HMMTxD learns which trackers are performing well and switch off the rest.

All of the proposed trackers were extensively evaluated on several benchmarks and publicly available tracking sequences and achieve excellent results in various evaluation criteria. The FoT achieved state-of-the-art performance in the VOT2013 benchmark, finishing second. Today, the FoT is used as a building block in complex applications such as multi-object tracking frameworks. The ASMS achieved state-of-the-art results in the VOT2015 benchmark and was chosen as the best performing method in terms of a trade-off between performance and running time. The HMMTxD demonstrated state-of-the-art performance in multiple benchmarks (VOT2014, VOT2015 and OTB).

The thesis also contributes, and provides an overview, to the Visual Object Tracking (VOT) evaluation methodology. This methodology provides a means for unbiased comparison of different tracking methods across publication, which is crucial for advancement of the state-of-the-art over a longer timespan and also provides a tools for deeper performance analysis of tracking methods. Furthermore, a annual workshops are organized on major computer vision conferences, where the authors are encouraged to submit their novel methods to compete against each other and where the advances in the visual object tracking are discussed.