TL; DR. We propose a novel perspective to regard the multiple object tracking task as an in-context ID prediction problem. Given a set of trajectories carried with ID information, MOTIP directly ...
We introduce a Hybrid Ensemble Decoder (HED) to improve query diversity and model robustness. We propose a simple fine-tuning recipe for cross-domain few-shot object detection, with plateau-aware ...