Our work focuses on improving the data association step in multiple object tracking by exploiting contextual information (social or spatial), improving the detection matching score with a siamese network architecture or merging multiple inputs (head detections or superpixels) in a single optimization problem.
One-Shot Video Object Segmentation (OSVOS), a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation. With just a glimpse of the first frame, we segment an object in a full video! SOA on DAVIS!
A new standardized way to compare multiple people tracking algorithms. Provided detections, ground truth for the training sequences, a set of challenging test sequences. Three challenges have been launched so far since 2014! A new pedestrian detection challenge is also here.
We propose to replace the proximal operator of the regularization used in many convex energy minimization algorithms by a denoising neural network. The latter therefore serves as an implicit natural image prior, while the data term can still be chosen independently. We obtain state-of-the-art reconstruction results, indicating the high generalizability of our approach and a reduction of the need for problem-specific training.