Image-based localization

Image-based localization

Being able to localize a vehicle or device by estimating a camera pose from an image is a fundamental requirement for many computer vision applications such as navigating autonomous vehicles, mobile robotics and Augmented Reality, and Structure-from-Motion (SfM). Most state-of-the-art approaches rely on local features such as SIFT to solve the problem of image-based localization. Given a SfM model of a scene, where each 3D point is associated with the image features from which it was triangulated, one proceeds in two stages: (i) establishing 2D-3D matches between features extracted from the query image and 3D points in the SfM model via descriptor matching; (ii) using these correspondences to determine the camera pose, usually by employing a n-point solver inside a RANSAC loop. Pose estimation can only succeed if enough correct matches have been found in the first stage. Consequently, limitations of both the feature detector, e.g., motion blur or strong illumination changes, or the descriptor, e.g., due to strong viewpoint changes, will cause localization approaches to fail. Recently, two approaches have tackled the problem of localization with end-to-end learning. PlaNetformulates localization as a classification problem, where the current position is matched to the best position in the training set. While this approach is suitable for localization in extremely large environments, it only allows to recover position but not orientation and its accuracy is bounded by the spatial extent of the training samples. More similar in spirit to our approach, PoseNet formulates 6DoF pose estimation as a regression problem.

An improved CNN+LSTM architecture for better localization

We propose a new CNN+LSTM architecture for camera pose regression for indoor and outdoor scenes. CNNs allow us to learn suitable feature representations for localization that are robust against motion blur and illumination changes. We make use of LSTM units on the CNN output, which play the role of a structured dimensionality reduction on the feature vector, leading to drastic improvements in localization performance.

First comparison of SIFT-based vs. CNN-based methods

We provide extensive quantitative comparison of CNN-based and SIFT-based localization methods, showing the weaknesses and strengths of each. We perform experiments on the Cambridge Landmarks outdoor dataset as well as the 7-Scenes indoor dataset. We show for the first time, that CNN-based methods still have a long way to go!

New indoor TUM LSI dataset

We present a new large-scale indoor dataset with accurate ground truth from a laser scanner. Experimental results show that due to the presence of textureless surfaces and repetitive structures, classic SIFT-based methods fail on the TUM LSI dataset.

Download the dataset here and start localizing!

Publications

2018

PDF Deep Perm-Set Net: Learn to Predict Sets with Unknown Permutation and Cardinality Using Deep Neural Networks.
S. Hamid Rezatofighi, Roman Kaskman, Farbod T. Motlagh, Qinfeng Shi, Daniel Cremers, Laura Leal-Taixe, and Ian Reid.
arxiv:1805.00613, 2018.
[pdf]

PDF Lifting Layers: Analysis and Applications.
Peter Ochs, Tim Meinhardt, Laura Leal-Taixe, and Michael Moeller.
European Conference on Computer Vision (ECCV), 2018.
[pdf]

PDF Deep Appearance Maps.
Maxim Maximov, Tobias Ritschel, and Mario Fritz.
arxiv:1804.00863, 2018.
[pdf]

PDF LIME: Live Intrinsic Material Estimation.
Abhimitra Meka, Maxim Maximov, Michael Zollhoefer, Avishek Chatterjee, Hans-Peter Seidel, Christian Richardt, and Christian Theobalt.
Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[pdf]

PDF Discrete-Continuous ADMM for Transductive Inference in Higher-Order MRFs.
Emanuel Laude, Jan-Hendrik Lange, Jonas Schuepfer, Csaba Domokos, L. Leal-Taixe, Frank R. Schmidt, Bjoern Andres, and Daniel Cremers.
Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[pdf]

2017

PDF Learning Proximal Operators: Using Denoising Networks for Regularizing Inverse Imaging Problems.
Tim Meinhardt, Michael Moeller, Caner Hazirbas, and Daniel Cremers.
IEEE International Conference on Computer Vision (ICCV), 2017.
[pdf] [code]

PDF Fusion of Head and Full-Body Detectors for Multi-Object Tracking.
R. Henschel, L. Leal-Taixe, D. Cremers, and B. Rosenhahn.
Computer Vision and Pattern Recognition Workshops (CVPRW), 2017.
[pdf]

PDF Tracking the Trackers: An Analysis of the State of the Art in Multiple Object Tracking.
L. Leal-Taixe, A. Milan, K. Schindler, D. Cremers, I. Reid, and S. Roth.
arXiv:1704.02781, 2017.
[pdf]

PDF Deep Depth from Focus.
C. Hazirbas, L. Leal-Taixe, and D. Cremers.
arXiv:1704.01085, 2017.
[pdf] [challenge]

PDF One-Shot Video Object Segmentation.
S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taixe, D. Cremers, and L. Van Gool.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[pdf] [code]

PDF Image-based localization using LSTMs for structured feature correlation.
F. Walch, C. Hazirbas, L. Leal-Taixe, T. Sattler, S. Hilsenbeck, and D. Cremers.
IEEE International Conference on Computer Vision (ICCV), 2017.
[pdf] [challenge]

PDF Video Object Segmentation Without Temporal Information.
K.-K. Maninis, S. Caelles, Y. Chen, J. Pont-Tuset, L. Leal-Taixe, D. Cremers, and L. Van Gool.
Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2017.
[pdf]

2016

PDF Tracking with multi-level features.
R. Henschel, L. Leal-Taixe, B. Rosenhahn, and K. Schindler.
arXiv:1607.07304, 2016.
[pdf]

PDF Learning by tracking: siamese CNN for robust target association.
L. Leal-Taixe, C. Canton-Ferrer, and K. Schindler.
IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR). DeepVision: Deep Learning for Computer Vision., 2016.
[pdf]

PDF MOT16: A benchmark for multi-object tracking.
A. Milan, L. Leal-Taixe, I. Reid, S. Roth, and K. Schindler.
arXiv:1603.00831, 2016.
[pdf] [challenge]

2015

PDF Continuous Pose Estimation with a Spatial Ensemble of Fisher Regressors.
M. Fenzi, L. Leal-Taixe, J. Ostermann, and T. Tuytelaars.
IEEE International Conference on Computer Vision (ICCV), 2015.
[pdf]

PDF Joint Tracking and Segmentation of Multiple Targets.
A. Milan, L. Leal-Taixe, K. Schindler, and I. Reid.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[pdf] [code]

PDF MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking.
L. Leal-Taixe, A. Milan, I. Reid, S. Roth, and K. Schindler.
arXiv:1504.01942, 2015.
[pdf] [challenge]

PDF Automatic tracking of vessel-like structures from a single starting point.
D.A.B. Oliveria, L. Leal-Taixe, R.Q. Feitosa, and B. Rosenhahn.
Computerized Medical Imaging and Graphics, 2015.
[pdf]

PDF Pose Estimation of Object Categories in Videos Using Linear Programming.
M. Fenzi, L. Leal-Taixe, K. Schindler, and B. Rosenhahn.
IEEE Winter Conference on Applications of Computer Vision (WACV), 2015.
[pdf]

2014

PDF Efficient multiple people tracking using minimum cost arborescences.
R. Henschel, L. Leal-Taixe, and B. Rosenhahn.
German Conference on Pattern Recognition (GCPR), 2014.
[pdf]

PDF Learning an image-based motion context for multiple people tracking.
L. Leal-Taixe, M. Fenzi, A. Kuznetsova, B. Rosenhahn, and S. Savarese.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
[pdf] [code]

PDF Multiple object tracking with context awareness.
L. Leal-Taixe.
PhD Thesis, 2014.
[pdf]

2013

PDF Class generative models based on feature regression for pose estimation of object categories.
M. Fenzi, L. Leal-Taixe, B. Rosenhahn, and J. Ostermann.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
[pdf]

PDF Real-time sign language recognition using a consumer depth camera.
A. Kuznetsova, L. Leal-Taixe, and B. Rosenhahn.
IEEE International Conference on Computer Vision (ICCV) Workshops. 3rd Workshop on Consumer Depth Cameras for Computer Vision (CDC4CV), 2013.
[pdf]

PDF Pedestrian interaction in tracking: the social force model and global optimization methods.
L. Leal-Taixe and B. Rosenhahn.
Modeling, Simulation and Visual Analysis of Crowds: A multidisciplinary perspective, Springer Berlin Heidelberg, 2013.
[pdf]

2012

PDF Outdoor and Large-Scale Real-World Scene Analysis.
F. Dellaert, J.-M. Frahm, M. Pollefeys, L. Leal-Taixe, and B. Rosenhahn.
Springer Berlin Heidelberg, 2012.
[pdf]

PDF 3D Object Recognition and Pose Estimation for Multiple Objects using Multi-Prioritized RANSAC and Model Updating.
M. Fenzi, R. Dragon, L. Leal-Taixe, B. Rosenhahn, and J. Ostermann.
German Conference on Pattern Recognition (GCPR), 2012.
[pdf]

PDF Branch-and-price global optimization for multi-view multi-target tracking.
L. Leal-Taixe, G. Pons-Moll, and B. Rosenhahn.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
[pdf] [code] [poster] [video]

PDF Exploiting pedestrian interaction via global optimization and social behaviors.
L. Leal-Taixe, G. Pons-Moll, and B. Rosenhahn.
Outdoor and Large-Scale Real-World Scene Analysis, Springer Berlin Heidelberg, 2012.
[pdf]

PDF Three dimensional tracking of exploratory behavior of barnacle cyprids using stereoscopy.
S. Maleschlijski, G. H. Sendra, A. Di Fino, L. Leal-Taixe, I. Thome, A. Terfort, N. Aldred, M. Grunze, A. S. Clare, B. Rosenhahn, and A. Rosenhahn.
Biointerphases, 2012.
[pdf]

PDF Data-driven Manifolds for Outdoor Motion Capture.
G. Pons-Moll, L. Leal-Taixe, J. Gall, and B. Rosenhahn.
Outdoor and Large-Scale Real-World Scene Analysis, Springer Berlin Heidelberg, 2012.
[pdf]

2011

PDF Everybody needs somebody: Modeling social and grouping behavior on a linear programming multiple people tracker.
L. Leal-Taixe, G. Pons-Moll, and B. Rosenhahn.
IEEE International Conference on Computer Vision (ICCV) Workshops. 1st Workshop on Modeling, Simulation and Visual Analysis of Large Crowds, 2011.
[pdf] [code] [video]

PDF A stereoscopic approach for three dimensional tracking of marine biofouling microorganisms.
S. Maleschlijski, L. Leal-Taixe, S. Weisse, A. Di Fino, N. Aldred, A. S. Clare, G. H. Sendra, B. Rosenhahn, and A. Rosenhahn.
Microscopic Image Analysis with Applications in Biology (MIAAB), 2011.
[pdf]

PDF Efficient and robust shape matching for model based human motion capture.
G. Pons-Moll, L. Leal-Taixe, T. Truong, and B. Rosenhahn.
German Conference on Pattern Recognition (GCPR), 2011.
[pdf]

PDF Outdoor human motion capture using inverse kinematics and von Mises-Fisher sampling.
G. Pons-Moll, A. Baak, J. Gall, L. Leal-Taixe, M. Mueller, H.-P.Seidel, and B. Rosenhahn.
IEEE International Conference on Computer Vision (ICCV), 2011.
[pdf] [supplementary]

PDF Understanding what we cannot see: automatic analysis of 4D digital in-line holographic microscopy data.
L. Leal-Taixe, M. Heydt, A. Rosenhahn, and B. Rosenhahn.
Video Processing and Computational Video, Springer Berlin Heidelberg, 2011.
[pdf]

2010

PDF Classification of swimming microorganisms motion patterns in 4D digital in-line holography data.
L. Leal-Taixe, M. Heydt, S. Weisse, A. Rosenhahn, and B. Rosenhahn.
German Conference on Pattern Recognition (GCPR), 2010.
[pdf] [video]

2009

PDF Automatic tracking of swimming microorganisms in 4D digital in-line holography data.
L. Leal-Taixe, M. Heydt, A. Rosenhahn, and B. Rosenhahn.
IEEE Workshops on Motion and Video Computing (WMVC), 2009.
[pdf]

PDF Automatic segmentation of multi-stain histology images of arteries.
L. Leal-Taixe.
Master Thesis, 2009.
[pdf]

PDF Automatic segmentation of arteries in multi-stain histology images.
L. Leal-Taixe, A. U. Coskun, B. Rosenhahn, and D. Brooks.
World Congress on Medical Physics and Biomedical Engineering, 2009.
[pdf]