This repo contains README and snippets for using the Action Genome dataset v1.0.
To use the snippets in this repo, python 3
and ffmpeg
are required.
Download Charades videos (scaled to 480p) from here and extract (or softlink) them under dataset/ag/videos
.
Download Action Genome annotations and place them under dataset/ag/annotations
.
We are not releasing the dumped frames from Charades videos. Instead, you can download the Charades videos from here and dump the frames following the instruction below.
After preparing all 480p videos into your dataset/ag/videos
, dump the frames into dataset/ag/frames
:
python tools/dump_frames.py
The dumped frames are ~74GB. The dumping may take half a day to finish. Note that we have only annotated sampled frames (see the sampling strategy in our paper) rather than all frames. If you prefer to dump all frames, run:
python tools/dump_frames.py --all_frames
The object_bbox_and_relationship.pkl
contains a dictionary structured like:
{...
'VIDEO_ID/FRAME_ID':
[...
{
'class': 'book',
'bbox': (x, y, w, h),
'attention_relationship': ['looking_at'],
'spatial_relationship': ['in_front_of'],
'contacting_relationship': ['holding', 'touching'],
'visible': True,
'metadata':
{
'tag': 'VIDEO_ID/FRAME_ID',
'set': 'train'
}
}
...]
...}
Noticeably, 'visible' indicates if the interacted object is visible in the frame.
The person_bbox.pkl
contains the person bounding boxes of each frame. Here we release the Faster-RCNN detected
person boxes as we've used in our paper. In our next version of the dataset, we'll release person boxes labeled
manually.
The frame_list.txt
contains all frames we've labeled.
The object_classes.txt
contains all classes of objects.
The relationship_classes.txt
contains all classes of human-object relationships.