Dependency and citation updated

IntelLabs · Apr 18, 2024 · ef6d766 · ef6d766
1 parent 087ed13
commit ef6d766
Show file tree

Hide file tree

Showing 2 changed files with 23 additions and 12 deletions.
diff --git a/README.md b/README.md
@@ -1,7 +1,7 @@
 # GraVi-T
-This repository contains an open-source codebase for Graph-based long-term Video undersTanding (GraVi-T). It is designed to serve as a spatial-temporal graph learning framework for multiple video understanding tasks. In the current version, it supports training and evaluating one of the state-of-the-art models, [SPELL](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136950367.pdf), for the tasks of active speaker detection, action localization, and action segmentation.
+This repository contains an open-source codebase for Graph-based long-term Video undersTanding (GraVi-T). It is designed to serve as a spatial-temporal graph learning framework for multiple video understanding tasks. The current version supports training and evaluating two state-of-the-art models: [SPELL](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136950367.pdf) for the tasks of active speaker detection, action localization, and action segmentation, and [VideoSAGE](https://arxiv.org/abs/2404.10539) for video summarization.
 
-In the near future, we will release more advanced graph-based approaches (e.g. [STHG](https://arxiv.org/abs/2306.10608)) for other tasks, including video summarization and audio-visual diarization.
+In the near future, we will release more advanced graph-based approaches (e.g. [STHG](https://arxiv.org/abs/2306.10608)) for other tasks, including audio-visual diarization.
 
 ![](docs/images/gravit_teaser.jpg?raw=true)
 
@@ -40,10 +40,10 @@ We want to note that our method has recently won many challenges, including the
 > Numbers in parentheses indicate the scores without using the suggested graph learning method.
 
 ### Video Summarization (Datasets: SumMe & TVSum)
-|  Model         |              Feature              | [Kendall's Tau](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kendalltau.html#scipy.stats.kendalltau)* | [Spearman's Rho](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.spearmanr.html#scipy.stats.spearmanr)* |
-|:---------------|:---------------------------------:|:-------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:|
-|  SPELL (Ours)  | eccv16_dataset_summe_google_pool5 |                                                  **0.12** (up from 0.09)                                                  |                                                 **0.16** (up from 0.12)                                                  |
-|  SPELL (Ours)  | eccv16_dataset_tvsum_google_pool5 |                                                  **0.30** (up from 0.27)                                                  |                                                 **0.42** (up from 0.39)                                                  |
+|  Model             |              Feature              | [Kendall's Tau](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kendalltau.html#scipy.stats.kendalltau)* | [Spearman's Rho](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.spearmanr.html#scipy.stats.spearmanr)* |
+|:-------------------|:---------------------------------:|:-------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:|
+|  VideoSAGE (Ours)  | eccv16_dataset_summe_google_pool5 |                                                  **0.12** (up from 0.09)                                                  |                                                 **0.16** (up from 0.12)                                                  |
+|  VideoSAGE (Ours)  | eccv16_dataset_tvsum_google_pool5 |                                                  **0.30** (up from 0.27)                                                  |                                                 **0.42** (up from 0.39)                                                  |
 > Numbers in parentheses indicate the scores without using the suggested graph learning method.\
 >  *Correlation metric between predicted frame importance and ground truth. 
 
@@ -132,10 +132,10 @@ Please refer to the instructions in [GETTING_STARTED_AS.md](docs/GETTING_STARTED
 Please refer to the instructions in [GETTING_STARTED_VS.md](docs/GETTING_STARTED_VS.md).
 
 ## Contributor
-GraVi-T is written and maintained by [Kyle Min](https://github.com/kylemin) (from version 1.0.0 to 1.1.0). Please contact me if you want to become a contributor to this library.
+GraVi-T is written and maintained by [Kyle Min](https://github.com/kylemin) (from version 1.0.0 to 1.1.0) and [Jose Rojas Chaves](https://github.com/joserochh) (version 1.2.0). Please refer to the release notes for details about each version's supported features and applications.
 
 ## Citation
-ECCV 2022 paper:
+ECCV 2022 paper about [SPELL](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136950367.pdf):
 ```bibtex
 @inproceedings{min2022learning,
   title={Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection},
@@ -147,7 +147,7 @@ ECCV 2022 paper:
 }
 ```
 
-Technical report for Ego4D challenge 2022:
+Ego4D workshop paper [@ECCV22](https://ego4d-data.org/workshops/eccv22/):
 ```bibtex
 @article{min2022intel,
   title={Intel Labs at Ego4D Challenge 2022: A Better Baseline for Audio-Visual Diarization},
@@ -157,7 +157,7 @@ Technical report for Ego4D challenge 2022:
 }
 ```
 
-Technical report for Ego4D challenge 2023:
+Ego4D workshop paper [@CVPR23](https://ego4d-data.org/workshops/cvpr23/) about [STHG](https://arxiv.org/abs/2306.10608):
 ```bibtex
 @article{min2023sthg,
   title={STHG: Spatial-Temporal Heterogeneous Graph Learning for Advanced Audio-Visual Diarization},
@@ -167,6 +167,16 @@ Technical report for Ego4D challenge 2023:
 }
 ```
 
+SG2RL workshop paper [@CVPR24](https://sites.google.com/view/sg2rl) about [VideoSAGE](https://arxiv.org/abs/2404.10539):
+```bibtex
+@article{chaves2024videosage,
+  title={VideoSAGE: Video Summarization with Graph Representation Learning},
+  author={Jose M. Rojas Chaves and Subarna Tripathi},
+  journal={arXiv preprint arXiv:2404.10539},
+  year={2024}
+}
+```
+
 ## Disclaimer
 
 > This “research quality code”  is for Non-Commercial purposes and provided by Intel “As Is” without any express or implied warranty of any kind. Please see the dataset's applicable license for terms and conditions. Intel does not own the rights to this data set and does not confer any rights to it. Intel does not warrant or assume responsibility for the accuracy or completeness of any information, text, graphics, links or other items within the code. A thorough security review has not been performed on this code. Additionally, this repository may contain components that are out of date or contain known security vulnerabilities.

diff --git a/requirements.txt b/requirements.txt
@@ -1,10 +1,11 @@
+h5py
 pyyaml
 pandas
 scikit-learn
 -f https://download.pytorch.org/whl/cu116/torch_stable.html
 -f https://data.pyg.org/whl/torch-1.13.1+cu116.html
 torch==1.13.1+cu116
 torchvision==0.14.1+cu116
-torch-scatter
-torch-sparse
+torch-scatter==2.1.1
+torch-sparse==0.6.17
 torch-geometric