Continual 3D Convolutional Neural Networks

Continual 3D Convolutional Neural Networks (Co3D CNNs) are a novel computational formulation of spatio-temporal 3D CNNs, in which videos are processed frame-by-frame rather than by clip.

In online processing tasks demanding frame-wise predictions, Co3D CNNs dispense with the computational redundancies of regular 3D CNNs, namely the repeated convolutions over frames, which appear in multiple clips.

Co3D CNNs are weight-compatible with regular 3D CNNs, do not need further training, and reduce the floating point operations for frame-wise computations by more than an order of magnitude!

News

2022-07-04 Our paper "Continual 3D Convolutional Neural Networks for Real-time Processing of Videos" has been accepted at the European Conference on Computer Vision (ECCV) 2022.

Principle

Continual Convolution. An input (green d or e) is convolved with a kernel (blue α, β). The intermediary feature-maps corresponding to all but the last temporal position are stored, while the last feature map and prior memory are summed to produce the resulting output. For a continual stream of inputs, Continual Convolutions produce identical outputs to regular convolutions.

Results

Accuracy/complexity trade-off for Continual X3D CoX3D and recent state-of-the-art 3D CNNs on Kinetics-400 using 1-clip/frame testing. For regular 3D CNNs, the FLOPs per clip ■ are noted, while the FLOPs per frame ● are shown for the Continual 3D CNNs. The CoX3D models used the weights from the X3D models without further fine-tuning. The global average pool size for the network is noted in each point. The diagonal and vertical arrows indicate respectively a transfer from regular to Continual 3D CNN and an extension of receptive field.

Benchmark of state-of-the-art methods on Kinetics-400. The noted accuracy is the single clip or frame top-1 score using RGB as the only input-modality. The performance was evaluated using publicly available pre-trained models without any further fine-tuning. For thoughput comparison, evaluations per second denote frames per second for the CoX3D models and clips per second for the remaining models. Throughput results are the mean +/- std of 100 measurements. Pareto-optimal models are marked with bold. Mem. is the maximum allocated memory during inference noted in megabytes.

Setup

Clone the project code

git clone https://github.com/LukasHedegaard/co3d
cd co3d

Create and activate conda environent (optional)

conda create --name co3d python=3.8
conda activate co3d

Install Python dependencies
```
pip install -e .[dev]
```
Install FFMPEG and UNRAR

Fill in the information on your dataset folder path in the .env file:

DATASETS_PATH=/path/to/datasets
LOGS_PATH=/path/to/logs
CACHE_PATH=.cache

Download dataset using these instructions

Models

CoX3D

CoX3D is the Continual-CNN implementation of X3D. In contrast to regular 3D CNNs, which take a whole video clip as input, Continual CNNs operate frame-by-frame and can thus speed up computation by a significant margin.

CoSlow

CoSlow is the Continual-CNN implementation of Slow.

CoI3D

CoSlow is the Continual-CNN implementation of I3d.

X3D

X3D [ArXiv, Repo] is a family of 3D variants of the EfficientNet achitecture, which produce state-of-the-art results for lightweight human activity recognition.

R(2+1)D

R(2+1)D [ArXiv, Repo] is a CNN for activity recognition, which separates the 3D convolution into a spatial 2D convolution and a temporal 1D convolution in order to reduce the number of parameters and increase the network efficiency.

I3D

I3D [ArXiv, Repo] is a 3D CNN for activity recognition, proposed to "inflate" the weights from a 2D CNN pretrained on ImageNet in the initialisation of the 3D CNN, thereby improving accuracy and reducing training time.

The implementation here is a port of the one found in the SlowFast Repo.

SlowFast

SlowFast [ArXiv, Repo] is two-stream 3D CNNs architecture for video-recognition. The structure includes two pathways with one pathway operating at a slower frame-rate than the other.

Slow

Slow is the "slow" branch of the SlowFast network [ArXiv, Repo]

Usage

The project code written in PyTorch and uses Ride to provide implementations of training, evaluations, and benchmarking methods. A plethora of usage options are available, which are best explored in the Ride docs or the command-line help, e.g.:

python models/cox3d/main.py --help

This repository contains the implementations of Continual X3D (CoX3D), as well as number of 3D-CNN baselines.

Each model has its own folder with a self-contained implementation, scripts, weight download utilities, hparams and profiling results. Overview tables for scripts used to download weights, run the model test-sequences, and throughput benchmarks are found below:

Download weights

Model	Dataset	Download
I3D-R50	Kinetics	download
R(2+1)D-18	Kinetics	download
SlowFast-8x8	Kinetics	download
SlowFast-4x16	Kinetics	download
Slow-8x8	Kinetics	download
(Co)X3D-XS	Kinetics	download
(Co)X3D-S	Kinetics	download
(Co)X3D-M	Kinetics	download
(Co)X3D-L	Kinetics	download
(Co)Slow-8x8	Charades	download

Evaluate on Kinetics400

Evaluate the 1-clip accuracy of pretrained models. The scripts should be executed from project root.

Model	Script
I3D-R50	`./models/i3d/scripts/test/kinetics400.sh`
R(2+1)D-18	`./models/r2plus1d/scripts/test/kinetics400.sh`
SlowFast	`./models/slowfast/scripts/test/kinetics400.sh`
Slow	`./models/slow/scripts/test/kinetics400.sh`
X3D	`./models/x3d/scripts/test/kinetics400.sh`
CoX3D	`./models/cox3d/scripts/test/kinetics400.sh`
CoSlow	`./models/coslow/scripts/test/kinetics400.sh`
CoI3D	`./models/coi3d/scripts/test/kinetics400.sh`

Evaluate on Charades

Evaluate the 1-clip accuracy of pretrained models. The scripts should be executed from project root.

Model	Script
(Co)Slow-8x8	`./models/coslow/scripts/test/charades.sh`

Benchmark FLOPs and throughput

The scripts should be executed from project root.

Model	Script
I3D-R50	`./models/i3d/scripts/profile/kinetics400.sh`
R(2+1)D-18	`./models/r2plus1d/scripts/profile/kinetics400.sh`
SlowFast	`./models/slowfast/scripts/profile/kinetics400.sh`
Slow	`./models/slow/scripts/profile/kinetics400.sh`
X3D	`./models/x3d/scripts/profile/kinetics400.sh`
CoX3D	`./models/cox3d/scripts/profile/kinetics400.sh`
CoI3D	`./models/coi3d/scripts/profile/kinetics400.sh`
CoSlow	`./models/coslow/scripts/profile/kinetics400.sh`

Citation

@inproceedings{hedegaard2022continual,
    title={Continual 3D Convolutional Neural Networks for Real-time Processing of Videos},
    author={Lukas Hedegaard and Alexandros Iosifidis},
    booktitle={European Conference on Computer Vision (ECCV)},
    year={2022},
}

Acknowledgement

This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871449 (OpenDR).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Continual 3D Convolutional Neural Networks

News

Principle

Results

Setup

Models

CoX3D

CoSlow

CoI3D

X3D

R(2+1)D

I3D

SlowFast

Slow

Usage

Download weights

Evaluate on Kinetics400

Evaluate on Charades

Benchmark FLOPs and throughput

Citation

Acknowledgement

Files

README.md

Latest commit

History

README.md

File metadata and controls

Continual 3D Convolutional Neural Networks

News

Principle

Results

Setup

Models

Usage

Download weights

Evaluate on Kinetics400

Evaluate on Charades

Benchmark FLOPs and throughput

Citation

Acknowledgement