This page provides basic tutorials about the usage of OVTrack. For installation instructions, please see INSTALL.md.
It is recommended to symlink the dataset root to $OVTrack/data
.
If your folder structure is different, you may need to change the corresponding paths in config files.
Our folder structure follows
a. Please follow TAO download instructions.
b. Please also prepare the LVIS dataset.
c. Please download coco annotations and put them in data/lvis/annotations/
. You can download the annotations from here.
d. To properly compare with previous methods on TAO, we need to merge LVIS and COCO annotations following original TAO instructions. You can directly download the merged LVIS and COCO annotations for base classes from here. If you want to generate it by yourself, you can also use the following command to merge the LVIS and COCO annotations (we use the script from GTR):
python tools/convert_datasets/merge_lvisv1_coco.py
Remove the novel classes and annotations.
python tools/convert_datasets/create_base_lvis.py
It is recommended to symlink the dataset root to $OVTrack/data
, and the model root to $OVTrack/saved_models
.
If your folder structure is different, you may need to change the corresponding paths in config files.
Our folder structure follows
├── ovtrack
├── tools
├── configs
├── data
├── tao
├── frames
├── train
├── val
├── test
├── annotations
├── ovtrack # this is the folder to save the generated json annotations.
|── ovtrack # this is the folder to save the generated images and pkl annotations.
├── lvis
├── train2017
├── annotations
|── saved_models # saved_models are the folder to save downloaded pretrained models and also the models you trained.
It will be easier if you create the same folder structure.
For more details about the installation and usage of the TETA metric, please refer to TETA.
a. Convert TAO annotation files.
python tools/convert_datasets/tao2coco.py -t ./data/tao/annotations/
b. Generate TAO val v1 file
python tools/convert_datasets/create_tao_v1.py data/tao/annotations/validation_ours.json
If you want to test on the test set split, please download the test set annotation from here We generate TAO test v1 file from the BURST dataset. Thanks the authors for making the annotations available.
c. During the training and inference, we use an additional file which save all class names: lvis_classes_v1.txt.
Please download it and put it in ${LVIS_PATH}/annotations/
.
This codebase is inherited from mmdetection. You can refer to the offical instructions. You can also refer to the short instructions below. We provide config files in configs.
We follow the ViLD paper to distill the CLIP model. Specifically, we use the implementation from DetPro and also uses its pre-learned prompt. Please refer to ViLD and DetPro for more details. You can download the pretrained distillation model from here.
Train with generated images: we upload the pre-generated images and corresponding annotations from static LVIS images. Please download the files and put the json file (ovtrack_lvis_generated_image_pairs_repeat_1.json
) in the data/tao/annotations/ovtrack
folder and the images (ovtrack_lvis_generated_image_pairs_repeat_1.h5
) in the data/tao/ovtrack
folder.
You can visualize the generated images following the juptyer script here
Customize image generation: Please refer to the instructions here
Please use following command to train the tracker from generated images pairs.
tools/dist_train.sh configs/ovtrack-teta/ovtrack_r50.py 8 23333 --work-dir saved_models/ovtrack_train/ --cfg-options data.train.dataset.dataset.ref_img_sampler.pesudo=True
Note that, in this repo, the evaluation metrics are computed with COCO-format. But to report the results on BDD100K, evaluating with BDD100K-format is required.
- single GPU
- single node multiple GPU
- multiple node
Download the following trained models for testing
- OVTrack model Please put it in the 'saved_models/ovtrack_detpro_prompt.pth'
- DetPro prompt Please put it in the 'saved_models/pretrained_models/detpro_prompt.pt'
You can use the following commands to test a dataset.
# single-gpu testing
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] [--show] [--cfg-options]
# multi-gpu testing
./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] [--cfg-options]
Optional arguments:
RESULT_FILE
: Filename of the output results in pickle format. If not specified, the results will not be saved to a file.EVAL_METRICS
: Items to be evaluated on the results. Allowed values depend on the dataset, e.g.,track
.--cfg-options
: If specified, some setting in the used config will be overridden.
tools/dist_test.sh configs/ovtrack-teta/ovtrack_r50.py saved_models/ovtrack_detpro_prompt.pth 8 25000 --eval track --eval-options resfile_path=results/ovtrack_teta_results/
tools/dist_test.sh configs/ovtrack-tao/ovtrack_r50.py saved_models/ovtrack_detpro_prompt.pth 8 25000 --eval track --eval-options resfile_path=results/ovtrack_tao_results/ use_tao_metric=True
You can download our model trained with the vanilla CLIP text encoder generated prompt from here.
To test with custom classes, you need to modify the text_input in the class_name.py file.
Please note that the visual tracking results depends a lot on the proper visualization threshold. The proper threshold value varys on the number of testing classes due to the softmax function.
Thus, you need to properly tune those values to get ideal visual tracking results. The recommanded values to tune are following: init_score_thr
and obj_score_thr
in tracker
, the score_thr
in the rcnn_test_cfg
, and visualization threshold in the visualizer.
You can run following command to test on any video.
python tools/inference.py configs/ovtrack-custom/ovtrack_r50.py saved_models/ovtrack/ovtrack_vanilla_clip_prompt.pth --video YOUR_VIDEO.mp4 --out_frame_dir YOUR_OUTPUT_FRAMES_DIR --out_video_dir YOUR_OUTPUT_VIDEOS_DIR --thickness 1