Image-based localization

Image-based localization

Being able to localize a vehicle or device by estimating a camera pose from an image is a fundamental requirement for many computer vision applications such as navigating autonomous vehicles, mobile robotics and Augmented Reality, and Structure-from-Motion (SfM). Most state-of-the-art approaches rely on local features such as SIFT to solve the problem of image-based localization. Given a SfM model of a scene, where each 3D point is associated with the image features from which it was triangulated, one proceeds in two stages: (i) establishing 2D-3D matches between features extracted from the query image and 3D points in the SfM model via descriptor matching; (ii) using these correspondences to determine the camera pose, usually by employing a n-point solver inside a RANSAC loop. Pose estimation can only succeed if enough correct matches have been found in the first stage. Consequently, limitations of both the feature detector, e.g., motion blur or strong illumination changes, or the descriptor, e.g., due to strong viewpoint changes, will cause localization approaches to fail. Recently, two approaches have tackled the problem of localization with end-to-end learning. PlaNetformulates localization as a classification problem, where the current position is matched to the best position in the training set. While this approach is suitable for localization in extremely large environments, it only allows to recover position but not orientation and its accuracy is bounded by the spatial extent of the training samples. More similar in spirit to our approach, PoseNet formulates 6DoF pose estimation as a regression problem.

An improved CNN+LSTM architecture for better localization

We propose a new CNN+LSTM architecture for camera pose regression for indoor and outdoor scenes. CNNs allow us to learn suitable feature representations for localization that are robust against motion blur and illumination changes. We make use of LSTM units on the CNN output, which play the role of a structured dimensionality reduction on the feature vector, leading to drastic improvements in localization performance.

First comparison of SIFT-based vs. CNN-based methods

We provide extensive quantitative comparison of CNN-based and SIFT-based localization methods, showing the weaknesses and strengths of each. We perform experiments on the Cambridge Landmarks outdoor dataset as well as the 7-Scenes indoor dataset. We show for the first time, that CNN-based methods still have a long way to go!

New indoor TUM LSI dataset

We present a new large-scale indoor dataset with accurate ground truth from a laser scanner. Experimental results show that due to the presence of textureless surfaces and repetitive structures, classic SIFT-based methods fail on the TUM LSI dataset.

Download the dataset here and start localizing!

Publications

2019

PDF AlignNet-3D for Fast Point Cloud Registration of Partially Observed Objects.
Johannes Gross, Aljosa Osep, and Bastian Leibe.
International Conference on 3D Vision (3DV), 2019.
[pdf] [code]

PDF To Learn or Not to Learn: Visual Localization from Essential Matrices.
Qunjie Zhou, Torsten Sattler, Marc Pollefeys, and Laura Leal-Taixe.
arXiv preprint arXiv:1908.01293, 2019.
[pdf]

PDF Tracking without bells and whistles.
Philipp Bergmann, Tim Meinhardt, and Laura Leal-Taixe.
IEEE International Conference on Computer Vision (ICCV), 2019.
[pdf] [code]

PDF Deep Appearance Maps.
Maxim Maximov, Tobias Ritschel, Laura Leal-Taixe, and Mario Fritz.
International Conference on Computer Vision (ICCV), 2019.
[pdf]

PDF Towards Generalizing Sensorimotor Control Across Weather Conditions.
Qadeer Khan, Patrick Wenzel, Daniel Cremers, and Laura Leal-Taixe.
International Conference on Intelligent Robots and Systems (IROS), 2019.
[pdf]

PDF CVPR19 Tracking and Detection Challenge: How crowded can it get?.
Patrick Dendorfer, Hamid Rezatofighi, Anton Milan, Javen Shi, Daniel Cremers, Ian Reid, Stefan Roth, Konrad Schindler, and Laura Leal-Taixe.
arXiv:1906.04567, 2019.
[pdf]

PDF Understanding the Limitations of CNN-based Absolute Camera Pose Regression.
Torsten Sattler, Qunjie Zhou, Marc Pollefeys, and Laura Leal-Taixe.
Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[pdf]

PDF MOTS: Multi-Object Tracking and Segmentation.
Paul Voigtlaender, Michael Krause, Aljosa Osep, Jonathon Luiten, Berin Balachandar Gnana Sekar, Andreas Geiger, and Bastian Leibe.
Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[pdf] [code]

PDF 4D Generic Video Object Proposals.
Aljosa Osep, Paul Voigtlaender, Mark Weber, Jonathon Luiten, and Bastian Leibe.
arXiv:1901.09260, 2019.
[pdf]

PDF Large-Scale Object Mining for Object Discovery from Unlabeled Video.
Aljosa Osep, Paul Voigtlaender, Jonathon Luiten, Stefan Breuers, and Bastian Leibe.
International Conference on Robotics and Automation (ICRA), 2019.
[pdf]

PDF Temporally Coherent GANs for Video Super-Resolution (TecoGAN).
Mengyu Chu, You Xie, L. Leal-Taixe, and N. Thuerey.
arxiv:1811.09393, 2019.
[pdf]

PDF Deep Perm-Set Net: Learn to Predict Sets with Unknown Permutation and Cardinality Using Deep Neural Networks.
S. Hamid Rezatofighi, Roman Kaskman, Farbod T. Motlagh, Qinfeng Shi, Daniel Cremers, Laura Leal-Taixe, and Ian Reid.
arxiv:1805.00613, 2019.
[pdf]

2018

PDF Lifting Layers: Analysis and Applications.
Peter Ochs, Tim Meinhardt, Laura Leal-Taixe, and Michael Moeller.
IEEE European Conference on Computer Vision (ECCV), 2018.
[pdf] [code]

PDF LIME: Live Intrinsic Material Estimation.
Abhimitra Meka, Maxim Maximov, Michael Zollhoefer, Avishek Chatterjee, Hans-Peter Seidel, Christian Richardt, and Christian Theobalt.
Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[pdf]

PDF Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking.
Aljosa Osep, Wolfgang Mehner, Paul Voigtlaender, and Bastian Leibe.
International Conference on Robotics and Automation (ICRA), 2018.
[pdf]

PDF Discrete-Continuous ADMM for Transductive Inference in Higher-Order MRFs.
Emanuel Laude, Jan-Hendrik Lange, Jonas Schuepfer, Csaba Domokos, L. Leal-Taixe, Frank R. Schmidt, Bjoern Andres, and Daniel Cremers.
Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[pdf]

PDF Modular Vehicle Control for Transferring Semantic Information Between Weather Conditions Using GANs.
Patrick Wenzel, Qadeer Khan, Daniel Cremers, and Laura Leal-Taixe.
Conference on Robot Learning (CoRL), 2018.
[pdf] [video]

2017

PDF Learning Proximal Operators: Using Denoising Networks for Regularizing Inverse Imaging Problems.
Tim Meinhardt, Michael Moeller, Caner Hazirbas, and Daniel Cremers.
IEEE International Conference on Computer Vision (ICCV), 2017.
[pdf] [code]

PDF Fusion of Head and Full-Body Detectors for Multi-Object Tracking.
R. Henschel, L. Leal-Taixe, D. Cremers, and B. Rosenhahn.
Computer Vision and Pattern Recognition Workshops (CVPRW), 2017.
[pdf]

PDF Tracking the Trackers: An Analysis of the State of the Art in Multiple Object Tracking.
L. Leal-Taixe, A. Milan, K. Schindler, D. Cremers, I. Reid, and S. Roth.
arXiv:1704.02781, 2017.
[pdf]

PDF Deep Depth from Focus.
C. Hazirbas, L. Leal-Taixe, and D. Cremers.
arXiv:1704.01085, 2017.
[pdf] [challenge]

PDF One-Shot Video Object Segmentation.
S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taixe, D. Cremers, and L. Van Gool.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[pdf] [code]

PDF Image-based localization using LSTMs for structured feature correlation.
F. Walch, C. Hazirbas, L. Leal-Taixe, T. Sattler, S. Hilsenbeck, and D. Cremers.
IEEE International Conference on Computer Vision (ICCV), 2017.
[pdf] [challenge]

PDF Combined Image- and World-Space Tracking in Traffic Scenes.
Aljosa Osep, Wolfgang Mehner, Markus Mathias, and Bastian Leibe.
International Conference on Robotics and Automation (ICRA), 2017.
[pdf] [code]

PDF Video Object Segmentation Without Temporal Information.
K.-K. Maninis, S. Caelles, Y. Chen, J. Pont-Tuset, L. Leal-Taixe, D. Cremers, and L. Van Gool.
Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2017.
[pdf]

2016

PDF Tracking with multi-level features.
R. Henschel, L. Leal-Taixe, B. Rosenhahn, and K. Schindler.
arXiv:1607.07304, 2016.
[pdf]

PDF Learning by tracking: siamese CNN for robust target association.
L. Leal-Taixe, C. Canton-Ferrer, and K. Schindler.
IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR). DeepVision: Deep Learning for Computer Vision., 2016.
[pdf]

PDF MOT16: A benchmark for multi-object tracking.
A. Milan, L. Leal-Taixe, I. Reid, S. Roth, and K. Schindler.
arXiv:1603.00831, 2016.
[pdf] [challenge]

PDF Unsupervised Learning of Shape-Motion Patterns for Objects in Urban Street Scenes.
Dirk Klostermann, Aljosa Osep, Joerg Stueckler, and Bastian Leibe.
British Machine Vision Conference (BMVC), 2016.
[pdf]

PDF Scene Flow Propagation for Semantic Mapping and Object Discovery in Dynamic Street Scenes.
Deyvid Kochanov, Aljosa Osep, Joerg Stueckler, and Bastian Leibe.
International Conference on Intelligent Robots and Systems (IROS), 2016.
[pdf]

PDF Multi-Scale Object Candidates for Generic Object Tracking in Street Scenes.
Aljosa Osep, Alexander Hermans, Francis Engelmann, Dirk Klostermann, , Markus Mathias, and Bastian Leibe.
International Conference on Robotics and Automation (ICRA), 2016.
[pdf]

2015

PDF Continuous Pose Estimation with a Spatial Ensemble of Fisher Regressors.
M. Fenzi, L. Leal-Taixe, J. Ostermann, and T. Tuytelaars.
IEEE International Conference on Computer Vision (ICCV), 2015.
[pdf]

PDF Joint Tracking and Segmentation of Multiple Targets.
A. Milan, L. Leal-Taixe, K. Schindler, and I. Reid.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[pdf] [code]

PDF MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking.
L. Leal-Taixe, A. Milan, I. Reid, S. Roth, and K. Schindler.
arXiv:1504.01942, 2015.
[pdf] [challenge]

PDF Automatic tracking of vessel-like structures from a single starting point.
D.A.B. Oliveria, L. Leal-Taixe, R.Q. Feitosa, and B. Rosenhahn.
Computerized Medical Imaging and Graphics, 2015.
[pdf]

PDF Pose Estimation of Object Categories in Videos Using Linear Programming.
M. Fenzi, L. Leal-Taixe, K. Schindler, and B. Rosenhahn.
IEEE Winter Conference on Applications of Computer Vision (WACV), 2015.
[pdf]

PDF A Fixed-Dimensional 3D Shape Representation for Matching Partially Observed Objects in Street Scenes.
Dennis Mitzel, Jasper Diesel, Aljosa Osep, Umer Rafi, and Bastian Leibe.
International Conference on Robotics and Automation (ICRA), 2015.
[pdf]

2014

PDF Efficient multiple people tracking using minimum cost arborescences.
R. Henschel, L. Leal-Taixe, and B. Rosenhahn.
German Conference on Pattern Recognition (GCPR), 2014.
[pdf]

PDF Learning an image-based motion context for multiple people tracking.
L. Leal-Taixe, M. Fenzi, A. Kuznetsova, B. Rosenhahn, and S. Savarese.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
[pdf] [code]

PDF Multiple object tracking with context awareness.
L. Leal-Taixe.
PhD Thesis, 2014.
[pdf]

2013

PDF Class generative models based on feature regression for pose estimation of object categories.
M. Fenzi, L. Leal-Taixe, B. Rosenhahn, and J. Ostermann.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
[pdf]

PDF Multi-View Normal Field Integration for 3D Reconstruction of Mirroring Objects.
Michael Weinmann, Aljosa Osep, Roland Ruiters, and Reinhard Klein.
International Conference on Computer Vision (ICCV), 2013.
[pdf]

PDF Real-time sign language recognition using a consumer depth camera.
A. Kuznetsova, L. Leal-Taixe, and B. Rosenhahn.
IEEE International Conference on Computer Vision (ICCV) Workshops. 3rd Workshop on Consumer Depth Cameras for Computer Vision (CDC4CV), 2013.
[pdf]

PDF Pedestrian interaction in tracking: the social force model and global optimization methods.
L. Leal-Taixe and B. Rosenhahn.
Modeling, Simulation and Visual Analysis of Crowds: A multidisciplinary perspective, Springer Berlin Heidelberg, 2013.
[pdf]

2012

PDF Outdoor and Large-Scale Real-World Scene Analysis.
F. Dellaert, J.-M. Frahm, M. Pollefeys, L. Leal-Taixe, and B. Rosenhahn.
Springer Berlin Heidelberg, 2012.
[pdf]

PDF 3D Object Recognition and Pose Estimation for Multiple Objects using Multi-Prioritized RANSAC and Model Updating.
M. Fenzi, R. Dragon, L. Leal-Taixe, B. Rosenhahn, and J. Ostermann.
German Conference on Pattern Recognition (GCPR), 2012.
[pdf]

PDF Branch-and-price global optimization for multi-view multi-target tracking.
L. Leal-Taixe, G. Pons-Moll, and B. Rosenhahn.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
[pdf] [code] [poster] [video]

PDF Exploiting pedestrian interaction via global optimization and social behaviors.
L. Leal-Taixe, G. Pons-Moll, and B. Rosenhahn.
Outdoor and Large-Scale Real-World Scene Analysis, Springer Berlin Heidelberg, 2012.
[pdf]

PDF Three dimensional tracking of exploratory behavior of barnacle cyprids using stereoscopy.
S. Maleschlijski, G. H. Sendra, A. Di Fino, L. Leal-Taixe, I. Thome, A. Terfort, N. Aldred, M. Grunze, A. S. Clare, B. Rosenhahn, and A. Rosenhahn.
Biointerphases, 2012.
[pdf]

PDF Data-driven Manifolds for Outdoor Motion Capture.
G. Pons-Moll, L. Leal-Taixe, J. Gall, and B. Rosenhahn.
Outdoor and Large-Scale Real-World Scene Analysis, Springer Berlin Heidelberg, 2012.
[pdf]

PDF Fusing Structured Light Consistency and Helmholtz Normals for 3D Reconstruction.
Michael Weinmann, Roland Ruiters, Aljosa Osep, Christopher Schwartz, and Reinhard Klein.
British Machine Vision Conference (BMVC), 2012.
[pdf]

2011

PDF Everybody needs somebody: Modeling social and grouping behavior on a linear programming multiple people tracker.
L. Leal-Taixe, G. Pons-Moll, and B. Rosenhahn.
IEEE International Conference on Computer Vision (ICCV) Workshops. 1st Workshop on Modeling, Simulation and Visual Analysis of Large Crowds, 2011.
[pdf] [code] [video]

PDF A stereoscopic approach for three dimensional tracking of marine biofouling microorganisms.
S. Maleschlijski, L. Leal-Taixe, S. Weisse, A. Di Fino, N. Aldred, A. S. Clare, G. H. Sendra, B. Rosenhahn, and A. Rosenhahn.
Microscopic Image Analysis with Applications in Biology (MIAAB), 2011.
[pdf]

PDF Efficient and robust shape matching for model based human motion capture.
G. Pons-Moll, L. Leal-Taixe, T. Truong, and B. Rosenhahn.
German Conference on Pattern Recognition (GCPR), 2011.
[pdf]

PDF Outdoor human motion capture using inverse kinematics and von Mises-Fisher sampling.
G. Pons-Moll, A. Baak, J. Gall, L. Leal-Taixe, M. Mueller, H.-P.Seidel, and B. Rosenhahn.
IEEE International Conference on Computer Vision (ICCV), 2011.
[pdf] [supplementary]

PDF Understanding what we cannot see: automatic analysis of 4D digital in-line holographic microscopy data.
L. Leal-Taixe, M. Heydt, A. Rosenhahn, and B. Rosenhahn.
Video Processing and Computational Video, Springer Berlin Heidelberg, 2011.
[pdf]

2010

PDF Classification of swimming microorganisms motion patterns in 4D digital in-line holography data.
L. Leal-Taixe, M. Heydt, S. Weisse, A. Rosenhahn, and B. Rosenhahn.
German Conference on Pattern Recognition (GCPR), 2010.
[pdf] [video]

2009

PDF Automatic tracking of swimming microorganisms in 4D digital in-line holography data.
L. Leal-Taixe, M. Heydt, A. Rosenhahn, and B. Rosenhahn.
IEEE Workshops on Motion and Video Computing (WMVC), 2009.
[pdf]

PDF Automatic segmentation of multi-stain histology images of arteries.
L. Leal-Taixe.
Master Thesis, 2009.
[pdf]

PDF Automatic segmentation of arteries in multi-stain histology images.
L. Leal-Taixe, A. U. Coskun, B. Rosenhahn, and D. Brooks.
World Congress on Medical Physics and Biomedical Engineering, 2009.
[pdf]