This repository contains the dataset and code for our paper.
Human action co-occurrence in lifestyle vlogs: two actions co-occur if they occur in the same interval of time (10 seconds) in a video. The actions are represented as nodes in a graph, the co-occurrence relation between two actions is represented through a link between the actions, and the action co-occurrence identification task as a link prediction task.
-
The resources for creating the data are in
data/utils
. More details indata_processing.py
-
The graph is saved in
data/graph/edges.csv
-
The node embeddings are saved in
data/graph/{default_feat_nodes}_nodes.csv
where default_feat_nodes can be: "txt_action", "txt_transcript", "vis_action", "vis_video", "vis_action_video". More details inlink_prediction.py
. -
Sample frames 4 frames per video and their action label are found in
frames_sample
- The visual features are in
data/clip_features.pt
- The textual and graph features are in
data/graph
and are computed indata_processing.py
, function save_nodes_edges_df
conda env create
conda activate action_order
pip install -r requirements.txt
spacy download en_core_web_sm
spacy download en_core_web_trf
- Run data collection and processing from
data_processing.py
- Run action co-occurrence/ link prediction models from
link_prediction.py
- Run downstream task experiments from
action_downstream.py
and from get_nearest_neighbours function inlink_prediction.py
- Run data analyses from
data_analysis.ipynb
- Run video related scripts from
utils