This repository adapts SAM2 to include real-time multi-object tracking. It allows users to specify and track a fixed number of objects in real time, integrating motion modeling from SAMURAI for improved tracking in complex scenarios.
SAM2 (Segment Anything Model 2) is designed for object segmentation and tracking but lacks built-in capabilities for performing this in real time.
SAMURAI enhances SAM2 by introducing motion modeling, leveraging temporal motion cues for better tracking accuracy without retraining or fine-tuning.
While this repository integrates SAMURAI's motion modeling, it does not support the motion-aware memory selection mechanism, as conditional operations applied to memory would simultaneously affect all tracked objects.
The core implementation of SAMURAI's motion modeling, originally found in sam2_base.py in SAMURAI's repository, has been relocated to sam2_object_tracker.py in this repository to maintain the original SAM2 codebase.
- Real-Time Tracking: Modified SAM2 to track a fixed number of objects in real time.
- Motion-Aware Memory: SAMURAI leverages temporal motion cues for robust object tracking without retraining or fine-tuning.
- YOLO Integration: Utilizes YOLO for object detection and mask generation as input to SAM2.
The core implementation resides in sam2_object_tracker.py
, where the number of objects to track must be specified during instantiation.
conda env create -f environment.yml
git clone https://github.com/zdata-inc/sam2_realtime
pip install -e .
pip install -e ".[notebooks]"
cd checkpoints
./download_ckpts.sh
cd ..
Run the demo notebook to visualize YOLO object detection and SAM2 object tracking in action:
notebooks/realtime_detect_and_track.ipynb
To perform detection and tracking on a video source, use the following script:
python detect_and_track.py \
--source "/data/datasets/SAM2/sav_train/sav_021/sav_021835.mp4" \
--cfg_filepath "./detect_and_track_config.yaml"
detect_and_track_config.yaml
provides additional parameters that can be set, including options to:
- Enable or disable visualization
- Select the GPU device
- Enable BoxMOT tracking for use instead of SAM tracking
- Configure YOLO's detection confidence threshold and specify labels to detect
- Set parameters for SAM including the number of objects to track and the IoU threshold for determining whether a YOLO detection is already being tracked
SAM2 Realtime extends SAM 2 by Meta FAIR, designed to enable real-time tracking. It integrates motion-aware segmentation techniques developed in SAMURAI by the Information Processing Lab at the University of Washington.
@article{ravi2024sam2,
title={SAM 2: Segment Anything in Images and Videos},
author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma,
Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan,
Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{\'a}r,
Piotr and Feichtenhofer, Christoph},
journal={arXiv preprint arXiv:2408.00714},
url={https://arxiv.org/abs/2408.00714},
year={2024}
}
@misc{yang2024samurai,
title={SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory},
author={Cheng-Yen Yang and Hsiang-Wei Huang and Wenhao Chai and Zhongyu Jiang and Jenq-Neng Hwang},
year={2024},
eprint={2411.11922},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.11922},
}