Track to Detect and Segment: An Online Multi-Object Tracker
Jialian Wu, Jiale Cao, Liangchen Song, Yu Wang, Ming Yang, Junsong Yuan
In CVPR, 2021. [Paper] [Project Page] [Demo (YouTube)]
Many thanks to CenterTrack authors for their great framework!
Please refer to INSTALL.md for installation instructions.
We reuse the demo script from CenterTrack. Before run the demo, first download our trained models:
CrowdHuman model (2D tracking),
MOT model (2D tracking) or nuScenes model (3D tracking).
Then, put the models in TraDeS_ROOT/models/
and cd TraDeS_ROOT/src/
. The demo result will be saved as a video in TraDeS_ROOT/results/
.
Demo for a video clip from MOT dataset: Run the demo (using the MOT model):
python demo.py tracking --dataset mot --load_model ../models/mot_half.pth --demo ../videos/mot_mini.mp4 --pre_hm --ltrb_amodal --pre_thresh 0.5 --track_thresh 0.4 --inference --clip_len 3 --trades --save_video --resize_video --input_h 544 --input_w 960
Demo for a video clip which we randomly selected from YouTube: Run the demo (using the CrowdHuman model):
python demo.py tracking --load_model ../models/crowdhuman.pth --num_class 1 --demo ../videos/street_2d.mp4 --pre_hm --ltrb_amodal --pre_thresh 0.5 --track_thresh 0.5 --inference --clip_len 2 --trades --save_video --resize_video --input_h 480 --input_w 864
Demo for your own video or image folder: Please specify the file path after --demo
and run (using the CrowdHuman model):
python demo.py tracking --load_model ../models/crowdhuman.pth --num_class 1 --demo $path to your video or image folder$ --pre_hm --ltrb_amodal --pre_thresh 0.5 --track_thresh 0.5 --inference --clip_len 2 --trades --save_video --resize_video --input_h $your_input_h$ --input_w $your_input_w$
(Some Notes: (i) For 2D tracking, the models are only used for person tracking, since our method is only trained on CrowdHuman or MOT. You may train a model on COCO or your own dataset for multi-category 2D object tracking.
(ii) --clip_len
is set to 3 for MOT; otherwise, it should be 2. You may refer to our paper for this detail. (iii) The CrowdHuman model is more able to generalize to real world scenes than the MOT model. Note that both datasets are in non-commercial licenses.
(iii) input_h
and input_w
shall be evenly divided by 32.)
Demo for a video clip from nuScenes dataset: Run the demo (using the nuScenes model):
python demo.py tracking,ddd --dataset nuscenes --load_model ../models/nuscenes.pth --demo ../videos/nuscenes_mini.mp4 --pre_hm --track_thresh 0.1 --inference --clip_len 2 --trades --save_video --resize_video --input_h 448 --input_w 800 --test_focal_length 633
(You will need to specify test_focal_length for monocular 3D tracking demo to convert the image coordinate system back to 3D. The value 633 is half of a typical focal length (~1266) in nuScenes dataset in input resolution 1600x900. The mini demo video is in an input resolution of 800x448, so we need to use a half focal length. You don't need to set the test_focal_length when testing on the original nuScenes data.)
You can also refer to CenterTrack for the usage of webcam demo (code is available in this repo, but we have not tested yet).
Please refer to Data.md for dataset preparation.
MOT17 Val | MOTA↑ | IDF1↑ | IDS↓ |
---|---|---|---|
Our Baseline | 64.8 | 59.5 | 1055 |
CenterTrack | 66.1 | 64.2 | 528 |
TraDeS (ours) | 68.2 | 71.7 | 285 |
Test on MOT17 validation set: Place the MOT model in $TraDeS_ROOT/models/ and run:
sh experiments/mot17_test.sh
Train on MOT17 halftrain set: Place the pretrained model in $TraDeS_ROOT/models/ and run:
sh experiments/mot17_train.sh
nuScenes Val | AMOTA↑ | AMOTP↓ | IDSA↓ |
---|---|---|---|
Our Baseline | 4.3 | 1.65 | 1792 |
CenterTrack | 6.8 | 1.54 | 813 |
TraDeS (ours) | 11.8 | 1.48 | 699 |
Test on nuScenes validation set: Place the nuScenes model in $TraDeS_ROOT/models/. You need to change the MOT and nuScenes dataset API versions due to their conflicts. The default installed versions are for MOT dataset. For experiments on nuScenes dataset, please run:
sh nuscenes_switch_version.sh
sh experiments/nuScenes_test.sh
To switch back to the API versions for MOT experiments, you can run:
sh mot_switch_version.sh
Train on nuScenes train set: Place the pretrained model in $TraDeS_ROOT/models/ and run:
sh experiments/nuScenes_train.sh
We follow CenterTrack which uses CrowdHuman to pretrain 2D object tracking model. Only the training set is used.
sh experiments/crowdhuman.sh
The trained model is available at CrowdHuman model.
Code will be released later on after we clean it up. Our implementation is based on here.
If you find it useful in your research, please consider citing our paper as follows:
@inproceedings{Wu2021TraDeS,
title={Track to Detect and Segment: An Online Multi-Object Tracker},
author={Wu, Jialian and Cao, Jiale and Song, Liangchen and Wang, Yu and Yang, Ming and Yuan, Junsong},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2021}}