paper | project page | video
Tsun-Hsuan Wang*, Yen-Chi Cheng*, Chieh Hubert Lin, Hwann-Tzong Chen, Min Sun (* indicate equal contribution)
IEEE International Conference on Computer Vision (ICCV), 2019
This repo is the implementation of our ICCV 2019 paper: "Point-to-Point Video Generation" in PyTorch.
Paper: arXiv, CVF Open Access
Point-to-Point (P2P) Video Generation. Given a pair of (orange) start- and (red) end-frames in the video and 3D skeleton domains, our method generates videos with smooth transitional frames of various lengths.
Requirements
- OS: Ubuntu 16.04
- NVIDIA GPU + CUDA
- Python 3.6
- PyTorch 1.0
- TensorFlow (for Tensorboard)
Prepare dataset
First clone this repo:
git clone https://github.com/yccyenchicheng/p2pvg.git
cd p2pvg
Then create a directory data_root
, and for each of the dataset we used:
-
MovingMNIST
. The testing sequence is created on the fly. Hence there is no need to preprocess or prepare anything for this dataset. -
Weizmann
. We crop each frame based on the bounding box from this url. Thus you can download the dataset from the above url and preprocess yourself. Also, you can download ours from this link. Extract the downloaded.zip
file and put it underdata_root
. -
Human 3.6M
. First you have to download the dataset from this url. Then put it underdata_root/processed/
. -
BAIR Robot Pushing
. Download the dataset from this url (~30 gb). Then follows the steps below:- Create a directory
data_root/bair
, put the downloaded.tar
file underdata_root/bair
and extract the.tar
file
tar -xvf data_root/bair/bair_robot_pushing_dataset_v0.tar -C data_root/bair
- Then use the script
data/convert_bair.py
implemented in this repo to convert the data:
python data/convert_bair.py --data_dir data_root/bair
this will create the directory
data_root/bair/preprocessed_data
and the training data will be stored under it. - Create a directory
Training
To train with Stochastic MovingMNIST
, run
python train.py --dataset mnist --channels 1 --num_digits 2 --max_seq_len 30 --n_past 1 \\
--weight_cpc 100 --weight_align 0.5 --skip_prob 0.5 --batch_size 100 \\
--backbone dcgan --beta 0.0001 --g_dim 128 --z_dim 10 --rnn_size 256
and the results, model checkpoints and .event
files will stored in logs/
. To visualize the training, run
tensorboard --logdir logs
and go to 127.0.0.1:6006
in your browser to see the visualization. To train with other datasets, replace --dataset <other_dataset>
, the corresponding channels --channels <n_channels>
and other parameters of your choices in the command.
P2P Generate
Given a video and a trained model, perform p2p generation via the following command:
python generate.py --ckpt <model.pth> --video <your_video.mp4>
and the output will be stored at gen_outputs
.
@article{p2pvg2019,
title={Point-to-Point Video Generation},
author={Wang, Tsun-Hsuan and Cheng, Yen-Chi and Hubert Lin, Chieh and Chen, Hwann-Tzong and Sun, Min},
journal={arXiv preprint},
year={2019}
}
@inproceedings{p2pvg2019,
title={Point-to-Point Video Generation},
author={Wang, Tsun-Hsuan and Cheng, Yen-Chi and Hubert Lin, Chieh and Chen, Hwann-Tzong and Sun, Min},
booktitle={The IEEE International Conference on Computer Vision (ICCV)},
month={October},
year={2019}
}
This code borrows heavily from the SVG. And we also adapt the code from VideoPose3D for the preprocessing of Human 3.6M. A huge thanks to them! :D