R(2+1)D [ArXiv, Repo] is a CNN for activity recognition, which separates the 3D convolution into a spatial 2D convolution and a temporal 1D convolution in order to reduce the number of parameters and increase the network efficiency.
The code for this model is a port of this implementation.
Pretrained models can be found here
See the scripts
folder.
Other options can be explored using
python main.py --help