Skip to content

An optimized pipeline for DINet reducing inference latency for up to 60% ๐Ÿš€. Kudos for the authors of the original repo for this amazing work.

Notifications You must be signed in to change notification settings

KeyStrokeVII/DINet_optimized_Win

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

38 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video (AAAI2023)

ๅœจ่ฟ™้‡Œๆ’ๅ…ฅๅ›พ็‰‡ๆ่ฟฐ Paper ย ย ย ย ย ย ย  demo video ย ย ย ย  Supplementary materials

๐Ÿค” How to achive this boost in inference latency?

To achieve this, several changes were implemented:

  • Removed DeepSpeech and utilized wav2vec for instant feature extraction, leveraging the speed and power of torch.
  • Trained a lightweight model to map the wav2vec features to DeepSpeech, maintaining the existing process.
  • Enhanced frames extraction for improved speed.
  • These adjustments contribute to a reduction of up to 60% in inference latency compared to the original implementation, all while maintaining quality.

Additionally, Docker has been introduced to facilitate faster, simpler, and more automated facial landmarks extraction.

Tested on:

  • Windows 11
  • Python version >= 3.9

๐Ÿ“– Prerequisites

To get started, follow these steps:

  • Download the resources (asserts.zip) n Google drive. Unzip the file and place the directory in the current directory (./). This zip file includes the model for mapping wav2vec to deepspeech, beside all other models.

Install Instructions

Set up a Conda environment by executing the following commands.

  conda create -n dinet python=3.9
  conda activate dinet

Clone repository

  git clone https://github.com/illeng/DINet_optimized_Win.git
  cd DINet

Install Dependencies

  pip install -r requirements.txt

Install torch 1.11.0

  pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 -f https://download.pytorch.org/whl/torch_stable.html

Install tensorflow 2.5.0

  pip install tensorflow==2.5.0

Installing pysoundfile

  conda install -c conda-forge pysoundfile

๐Ÿš€ Inference

Run inference with example videos:

python inference.py --mouth_region_size=256 --source_video_path=./asserts/examples/testxxx.mp4 --source_openface_landmark_path=./asserts/examples/testxxx.csv --driving_audio_path=./asserts/examples/driving_audio_xxx.wav --pretrained_clip_DINet_path=./asserts/clip_training_DINet_256mouth.pth 

Use OpenFace to detect smooth facial landmarks of your custom video..

Acknowledge

The AdaAT is borrowed from AdaAT. The deepspeech feature is borrowed from AD-NeRF. The basic module is borrowed from first-order. Thanks for their released code.

About

An optimized pipeline for DINet reducing inference latency for up to 60% ๐Ÿš€. Kudos for the authors of the original repo for this amazing work.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%