Automatic Scene Interpretation with Totally Occluded Objects

Research output: ThesisDoctoral Thesis

146 Downloads (Pure)


In this thesis we propose and evaluate a customizable computer vision system with cognitive abilities that tracks multiple objects over a long period of time, even if several of the relevant objects are totally occluded by other objects. At any time the system gives plausible object locations of all relevant objects, independently of their visibility, by maintaining possible interpretations of the observed visual input data. We use an approach that combines bottom-up visual processing with top-down reasoning and inference. Furthermore, our computer vision system has learning capabilities. These learning capabilities are used to obtain more robust tracking results if the visual appearance of relevant objects changes gradually over time. Our modular vision system allows to use several tracking algorithms from the literature, as long as they fit our minimum interface requirements. Template trackers, mean-shift trackers, and interest-point based trackers are employed to show the adaptability of our vision system. Consequently, in the second part of this thesis, we study the effect of combining different types of low-level visual features. The key intuition is that a system using a rich set of low-level visual features should be more robust than a system relying on only a single visual feature. The problem is which visual features are well suited for a specific relevant object. A method is proposed which combines the outcome of detectors based on different visual features using a support vector machine (SVM). Our feature combination is tested on the standard Visual Object Classes challenges (VOC) datasets. Results on the VOC datasets show that our method significantly improves over the performance of detectors which use only a single visual feature.
Translated title of the contributionAutomatische Szeneninterpretation mit vollständig verdeckten Objekten
Original languageEnglish
  • Auer, Peter, Assessor A (internal)
  • Vincze, Markus, Assessor B (external), External person
Publication statusPublished - 2017

Bibliographical note

embargoed until null


  • scene interpretation
  • computer vision
  • video processing
  • tracking
  • videos
  • image classification
  • machine learning

Cite this