Skip to content

Official source code for "Continual 3D Convolutional Neural Networks for Real-time Processing of Videos" [ECCV2022]

License

Notifications You must be signed in to change notification settings

LukasHedegaard/co3d

Repository files navigation

Continual 3D Convolutional Neural Networks

Paper Framework License Code style: black

Continual 3D Convolutional Neural Networks (Co3D CNNs) are a novel computational formulation of spatio-temporal 3D CNNs, in which videos are processed frame-by-frame rather than by clip.

In online processing tasks demanding frame-wise predictions, Co3D CNNs dispense with the computational redundancies of regular 3D CNNs, namely the repeated convolutions over frames, which appear in multiple clips.

Co3D CNNs are weight-compatible with regular 3D CNNs, do not need further training, and reduce the floating point operations for frame-wise computations by more than an order of magnitude!

News

Principle


Continual Convolution. An input (green d or e) is convolved with a kernel (blue α, β). The intermediary feature-maps corresponding to all but the last temporal position are stored, while the last feature map and prior memory are summed to produce the resulting output. For a continual stream of inputs, Continual Convolutions produce identical outputs to regular convolutions.

Results


Accuracy/complexity trade-off for Continual X3D CoX3D and recent state-of-the-art 3D CNNs on Kinetics-400 using 1-clip/frame testing. For regular 3D CNNs, the FLOPs per clip ■ are noted, while the FLOPs per frame ● are shown for the Continual 3D CNNs. The CoX3D models used the weights from the X3D models without further fine-tuning. The global average pool size for the network is noted in each point. The diagonal and vertical arrows indicate respectively a transfer from regular to Continual 3D CNN and an extension of receptive field.


Benchmark of state-of-the-art methods on Kinetics-400. The noted accuracy is the single clip or frame top-1 score using RGB as the only input-modality. The performance was evaluated using publicly available pre-trained models without any further fine-tuning. For thoughput comparison, evaluations per second denote frames per second for the CoX3D models and clips per second for the remaining models. Throughput results are the mean +/- std of 100 measurements. Pareto-optimal models are marked with bold. Mem. is the maximum allocated memory during inference noted in megabytes.

Setup

  1. Clone the project code

    git clone https://github.com/LukasHedegaard/co3d
    cd co3d
  2. Create and activate conda environent (optional)

    conda create --name co3d python=3.8
    conda activate co3d
  3. Install Python dependencies

    pip install -e .[dev]
  4. Install FFMPEG and UNRAR

  5. Fill in the information on your dataset folder path in the .env file:

    DATASETS_PATH=/path/to/datasets
    LOGS_PATH=/path/to/logs
    CACHE_PATH=.cache
  6. Download dataset using these instructions

Models

CoX3D is the Continual-CNN implementation of X3D. In contrast to regular 3D CNNs, which take a whole video clip as input, Continual CNNs operate frame-by-frame and can thus speed up computation by a significant margin.

CoSlow is the Continual-CNN implementation of Slow.

CoSlow is the Continual-CNN implementation of I3d.

X3D [ArXiv, Repo] is a family of 3D variants of the EfficientNet achitecture, which produce state-of-the-art results for lightweight human activity recognition.

R(2+1)D [ArXiv, Repo] is a CNN for activity recognition, which separates the 3D convolution into a spatial 2D convolution and a temporal 1D convolution in order to reduce the number of parameters and increase the network efficiency.

I3D [ArXiv, Repo] is a 3D CNN for activity recognition, proposed to "inflate" the weights from a 2D CNN pretrained on ImageNet in the initialisation of the 3D CNN, thereby improving accuracy and reducing training time.

The implementation here is a port of the one found in the SlowFast Repo.

SlowFast [ArXiv, Repo] is two-stream 3D CNNs architecture for video-recognition. The structure includes two pathways with one pathway operating at a slower frame-rate than the other.

Slow is the "slow" branch of the SlowFast network [ArXiv, Repo]

Usage

The project code written in PyTorch and uses Ride to provide implementations of training, evaluations, and benchmarking methods. A plethora of usage options are available, which are best explored in the Ride docs or the command-line help, e.g.:

python models/cox3d/main.py --help 

This repository contains the implementations of Continual X3D (CoX3D), as well as number of 3D-CNN baselines.

Each model has its own folder with a self-contained implementation, scripts, weight download utilities, hparams and profiling results. Overview tables for scripts used to download weights, run the model test-sequences, and throughput benchmarks are found below:

Download weights

Model Dataset Download
I3D-R50 Kinetics download
R(2+1)D-18 Kinetics download
SlowFast-8x8 Kinetics download
SlowFast-4x16 Kinetics download
Slow-8x8 Kinetics download
(Co)X3D-XS Kinetics download
(Co)X3D-S Kinetics download
(Co)X3D-M Kinetics download
(Co)X3D-L Kinetics download
(Co)Slow-8x8 Charades download

Evaluate on Kinetics400

Evaluate the 1-clip accuracy of pretrained models. The scripts should be executed from project root.

Model Script
I3D-R50 ./models/i3d/scripts/test/kinetics400.sh
R(2+1)D-18 ./models/r2plus1d/scripts/test/kinetics400.sh
SlowFast ./models/slowfast/scripts/test/kinetics400.sh
Slow ./models/slow/scripts/test/kinetics400.sh
X3D ./models/x3d/scripts/test/kinetics400.sh
CoX3D ./models/cox3d/scripts/test/kinetics400.sh
CoSlow ./models/coslow/scripts/test/kinetics400.sh
CoI3D ./models/coi3d/scripts/test/kinetics400.sh

Evaluate on Charades

Evaluate the 1-clip accuracy of pretrained models. The scripts should be executed from project root.

Model Script
(Co)Slow-8x8 ./models/coslow/scripts/test/charades.sh

Benchmark FLOPs and throughput

The scripts should be executed from project root.

Model Script
I3D-R50 ./models/i3d/scripts/profile/kinetics400.sh
R(2+1)D-18 ./models/r2plus1d/scripts/profile/kinetics400.sh
SlowFast ./models/slowfast/scripts/profile/kinetics400.sh
Slow ./models/slow/scripts/profile/kinetics400.sh
X3D ./models/x3d/scripts/profile/kinetics400.sh
CoX3D ./models/cox3d/scripts/profile/kinetics400.sh
CoI3D ./models/coi3d/scripts/profile/kinetics400.sh
CoSlow ./models/coslow/scripts/profile/kinetics400.sh

Citation

@inproceedings{hedegaard2022continual,
    title={Continual 3D Convolutional Neural Networks for Real-time Processing of Videos},
    author={Lukas Hedegaard and Alexandros Iosifidis},
    booktitle={European Conference on Computer Vision (ECCV)},
    year={2022},
}

Acknowledgement

This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871449 (OpenDR).

About

Official source code for "Continual 3D Convolutional Neural Networks for Real-time Processing of Videos" [ECCV2022]

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published