Predicting Personalized Head Movement from Short Video and Speech Signal

We provide PyTorch implementations for our TMM paper "Predicting Personalized Head Movement from Short Video and Speech Signal"(https://ieeexplore.ieee.org/document/9894719).

Note that this code is protected under patent. It is for research purposes only at your university (research institution) only. If you are interested in business purposes/for-profit use, please contact Prof.Liu (the corresponding author, email: [email protected]).

We provide a demo video here.

Our Proposed Framework

Prerequisites

Linux or macOS
NVIDIA GPU
Python 3
MATLAB

Getting Started

Installation

You can create a virtual env, and install all the dependencies by

pip install -r requirements.txt

Download pre-trained models

Including pre-trained general models
Download from BaiduYun(extract code: r24f) and copy to corresponding subfolders:
- Put latest_iddNet.pth and latest_cttMotionNet.pth under Audio/model/Motion846_contraloss4_autogradhidden_hn_conti_10epochs.
- Put atcnet_lstm_199.pth under Audio/model/atcnet_pose01.
- Put 0_net_G.pth under render-to-video/checkpoints/seq_p2p.

Download face model for 3d face reconstruction

We use the code in WM3DR for 3d face reconstruction
Download the face reconstruction model final.pth and put it under WM3DR/model
The 3DMM model used in this repo is from Deep3dPytorch, you should generate mSEmTFK68etc.chj file and put it under WM3DR/BFM
Download shape_predictor_68_face_landmarks.dat.bz2, decompress it, and put it under Deep3DFaceReconstruction

Train on a target peron's short video

1. Prepare a talking face video that satisfies: 1) contains a single person, 2) 25 fps, 3) longer than 12 seconds, 4) without large body translation (e.g. move from the left to the right of the screen). Rename the video to [person_id].mp4 (e.g. 1.mp4) and copy to Data subfolder.

Note: You can make a video to 25 fps by

ffmpeg -i xxx.mp4 -r 25 xxx.mp4

1. Preprocess and train

python train.py --id [person_id] --gpu_id [gpu_id]

Test on a target peron

Place the audio file (.wav or .mp3) for test under Audio/audio/. Run [with generated poses]

python test.py --id [person_id] --audio [audio_file_name (e.g., 4_00003)] --gpu_id [gpu_id]

This program will print 'saved to xxx.mov' if the videos are successfully generated. It will output 2 movs, one is a video with face only (_full9.mov), the other is a video with background (_transbigbg.mov).

Acknowledgments

The face reconstruction code is from Deep3DFaceReconstruction and WM3DR, the arcface code is from insightface, the gan code is developed based on pytorch-CycleGAN-and-pix2pix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Personalized Head Movement from Short Video and Speech Signal

Our Proposed Framework

Prerequisites

Getting Started

Installation

Download pre-trained models

Download face model for 3d face reconstruction

Train on a target peron's short video

Test on a target peron

Acknowledgments

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Audio		Audio
Data		Data
Deep3DFaceReconstruction		Deep3DFaceReconstruction
WM3DR		WM3DR
render-to-video		render-to-video
.gitignore		.gitignore
pipeline.png		pipeline.png
readme.md		readme.md
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

yiranran/Predict-Personalized-Head-Movement-TMM

Folders and files

Latest commit

History

Repository files navigation

Predicting Personalized Head Movement from Short Video and Speech Signal

Our Proposed Framework

Prerequisites

Getting Started

Installation

Download pre-trained models

Download face model for 3d face reconstruction

Train on a target peron's short video

Test on a target peron

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages