Current Search Keywords: Talking Face
, Talking Head
, Visual Dubbing
, Face Genertation
, Lip Sync
, Talker
, Portrait
, Talking Video
, Head Synthesis
, Face Reenactment
, Wav2Lip
, Talking Avatar
, Lip Generation
, Lip-Synchronization
, Portrait Animation
, Facial Animation
, Lip Expert
If you have any other keywords, please feel free to let us know :)
We now offer support for article analysis through large language models. You can view this feature by clicking the
Paper Analysis
link below. Currently, we are experimenting withClaude.ai
orMoonshot AI
. This is to help everyone quickly skim through the latest research papers.
Recent Trends (by AI)
- Based on the provided snippets, I have identified the top five prominent keywords and synthesized the key themes, methodologies, findings, and shifts in perspective from the papers:
1. One-shot Talking Face Generation: The concept of generating realistic talking faces from a single image is a recurring theme across multiple papers. Techniques like NeRFFaceSpeech and AniTalker emphasize creating lifelike animations using minimal input data. These methods leverage generative models and audio-driven dynamics to produce natural-looking facial movements. The key challenge addressed is achieving high-quality synthesis while preserving identity and visual details.
2. Lip Synchronization and Audio-Visual Correlation: Ensuring accurate lip synchronization with corresponding audio is critical in talking face generation. Papers like "Audio-Visual Speech Representation Expert" and SwapTalk focus on synchronizing lip movements with audio while maintaining the visual quality of the generated faces. The methodologies involve advanced neural networks and latent space manipulation to enhance synchronization and minimize artifacts.
3. Real-time Rendering and Efficiency: The need for fast and efficient rendering is highlighted in works such as GSTalker. This model utilizes deformable Gaussian splatting to enable real-time audio-driven face generation. The emphasis is on reducing training time and improving rendering speeds without compromising the quality of the generated faces. This shift towards real-time applications reflects the growing demand for practical and scalable solutions in various domains.
4. Multimodal Emotion Representation: EMOPortraits introduces the integration of emotional expressions into talking face avatars. This approach enhances the realism and expressiveness of generated faces by incorporating emotion-driven dynamics. The methodology involves multimodal inputs and cross-driving synthesis, where avatars are animated with different emotional states, addressing the challenge of creating more engaging and lifelike digital avatars.
5. Identity Preservation and Customization: Maintaining the unique identity of the subject while generating talking faces is a crucial aspect explored in SwapTalk and AniTalker. These papers propose innovative solutions for identity-decoupled motion encoding and one-shot customization. The goal is to create personalized talking faces that retain the distinct features of the original subject, enabling applications in personalized media and communication.
Overall, the interconnectedness among these papers highlights a trend towards achieving higher realism, efficiency, and customization in talking face generation. The field is moving towards developing more practical and scalable solutions that can be applied in real-time scenarios, with an increasing focus on emotional expressiveness and identity preservation. Researchers are exploring advanced neural network architectures, generative models, and multimodal approaches to push the boundaries of what's possible in this rapidly evolving domain.
>>>> Each Paper Analysis (by AI) <<<<
Table of Contents
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-12-23 | FaceLift: Single Image to 3D Head with View Generation and GS-LRM | Weijie Lyu et.al. | 2412.17812 | null |
2024-12-22 | FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation | Tianyun Zhong et.al. | 2412.16915 | null |
2024-12-18 | Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters | Steven Hogue et.al. | 2412.14333 | link |
2024-12-18 | GLCF: A Global-Local Multimodal Coherence Analysis Framework for Talking Face Generation Detection | Xiaocan Chen et.al. | 2412.13656 | null |
2024-12-18 | Learning to Control an Android Robot Head for Facial Animation | Marcel Heisler et.al. | 2412.13641 | null |
2024-12-18 | Real-time One-Step Diffusion-based Expressive Portrait Videos Generation | Hanzhong Guo et.al. | 2412.13479 | link |
2024-12-18 | VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization | Tao Liu et.al. | 2412.09892 | null |
2024-12-16 | Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content | Rohit Kundu et.al. | 2412.12278 | null |
2024-12-13 | GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expression | Ziqi Zhou et.al. | 2412.09296 | link |
2024-12-12 | LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync | Chunyu Li et.al. | 2412.09262 | link |
2024-12-12 | EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing | Gaoxiang Cong et.al. | 2412.08988 | null |
2024-12-11 | PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis | Yifan Xie et.al. | 2412.08504 | null |
2024-12-10 | PortraitTalk: Towards Customizable One-Shot Audio-to-Talking Face Generation | Fatemeh Nazarieh et.al. | 2412.07754 | null |
2024-12-10 | IF-MDM: Implicit Face Motion Diffusion Model for High-Fidelity Realtime Talking Head Generation | Sejong Yang et.al. | 2412.04000 | null |
2024-12-05 | MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation | Longtao Zheng et.al. | 2412.04448 | null |
2024-12-05 | Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks | Jiahao Cui et.al. | 2412.00733 | link |
2024-12-04 | SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model | Yan Li et.al. | 2412.03430 | null |
2024-12-02 | One Shot, One Talk: Whole-body Talking Avatar from a Single Image | Jun Xiang et.al. | 2412.01106 | null |
2024-12-01 | Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation | Shuling Zhao et.al. | 2412.00719 | null |
2024-11-29 | LokiTalk: Learning Fine-Grained and Generalizable Correspondences to Enhance NeRF-based Talking Head Synthesis | Tianqi Li et.al. | 2411.19525 | null |
2024-11-29 | Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis | Tianqi Li et.al. | 2411.19509 | null |
2024-11-29 | V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow | Jeongsoo Choi et.al. | 2411.19486 | null |
2024-11-26 | Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey | Hong-Hanh Nguyen-Le et.al. | 2411.17911 | null |
2024-11-25 | Sonic: Shifting Focus to Global Audio Perception in Portrait Animation | Xiaozhong Ji et.al. | 2411.16331 | null |
2024-11-25 | ESARM: 3D Emotional Speech-to-Animation via Reward Model from Automatically-Ranked Demonstrations | Xulong Zhang et.al. | 2411.13089 | null |
2024-11-24 | LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis | Haojie Zhang et.al. | 2411.16748 | null |
2024-11-23 | EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion | Haotian Wang et.al. | 2411.16726 | null |
2024-11-23 | ConsistentAvatar: Learning to Diffuse Fully Consistent Talking Head Avatar with Temporal Guidance | Haijie Yang et.al. | 2411.15436 | null |
2024-11-20 | Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait Synthesis | Pegah Salehi et.al. | 2411.13209 | link |
2024-11-20 | JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation | Xuyang Cao et.al. | 2411.09209 | link |
2024-11-14 | LES-Talker: Fine-Grained Emotion Editing for Talking Head Generation in Linear Emotion Space | Guanwen Feng et.al. | 2411.09268 | null |
2024-11-06 | Large Generative Model-assisted Talking-face Semantic Communication System | Feibo Jiang et.al. | 2411.03876 | null |
2024-10-31 | Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts | Xiang Deng et.al. | 2410.23836 | null |
2024-10-29 | Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing | Haonan Tong et.al. | 2410.22112 | null |
2024-10-24 | Real-time 3D-aware Portrait Video Relighting | Ziqi Cai et.al. | 2410.18355 | link |
2024-10-21 | Joker: Conditional 3D Head Synthesis with Extreme Facial Expressions | Malte Prinzler et.al. | 2410.16395 | null |
2024-10-18 | Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization | Bin Lin et.al. | 2410.14283 | null |
2024-10-18 | DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation | Hanbo Cheng et.al. | 2410.13726 | link |
2024-10-16 | MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting | Yue Zhang et.al. | 2410.10122 | link |
2024-10-15 | Titanic Calling: Low Bandwidth Video Conference from the Titanic Wreck | Fevziye Irem Eyiokur et.al. | 2410.11434 | null |
2024-10-15 | MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes | Zhenhui Ye et.al. | 2410.06734 | null |
2024-10-14 | Character-aware audio-visual subtitling in context | Jaesung Huh et.al. | 2410.11068 | null |
2024-10-14 | Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads | Federico Nocentini et.al. | 2410.11041 | null |
2024-10-14 | TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model | Jiazhi Guan et.al. | 2410.10696 | null |
2024-10-14 | Generative Human Video Compression with Multi-granularity Temporal Trajectory Factorization | Shanzhi Yin et.al. | 2410.10171 | null |
2024-10-10 | MMHead: Towards Fine-grained Multi-modal 3D Facial Animation | Sijing Wu et.al. | 2410.07757 | null |
2024-10-09 | FreeAvatar: Robust 3D Facial Animation Transfer by Learning an Expression Foundation Model | Feng Qiu et.al. | 2409.13180 | null |
2024-10-01 | LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details | Jian Yang et.al. | 2410.00990 | null |
2024-09-29 | Learning Frame-Wise Emotion Intensity for Audio-Driven Talking-Head Generation | Jingyi Xu et.al. | 2409.19501 | null |
2024-09-27 | Diverse Code Query Learning for Speech-Driven Facial Animation | Chunzhi Gu et.al. | 2409.19143 | null |
2024-09-26 | Stable Video Portraits | Mirela Ostrek et.al. | 2409.18083 | null |
2024-09-25 | ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE | Sichun Wu et.al. | 2409.07966 | link |
2024-09-24 | FastTalker: Jointly Generating Speech and Conversational Gestures from Text | Zixin Guo et.al. | 2409.16404 | null |
2024-09-23 | FaceVid-1K: A Large-Scale High-Quality Multiracial Human Face Video Dataset | Donglin Di et.al. | 2410.07151 | null |
2024-09-23 | MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning | Yue Han et.al. | 2409.15179 | null |
2024-09-18 | JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation | Sai Tanmay Reddy Chakkera et.al. | 2409.12156 | null |
2024-09-18 | GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations | Kartik Teotia et.al. | 2409.11951 | null |
2024-09-17 | 3DFacePolicy: Speech-Driven 3D Facial Animation with Diffusion Policy | Xuanmeng Sha et.al. | 2409.10848 | null |
2024-09-16 | DreamHead: Learning Spatial-Temporal Correspondence via Hierarchical Diffusion for Audio-driven Talking Head Synthesis | Fa-Ting Hong et.al. | 2409.10281 | null |
2024-09-14 | StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads | Suzhen Wang et.al. | 2409.09292 | null |
2024-09-11 | DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures | Steven Hogue et.al. | 2409.07649 | null |
2024-09-11 | EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion | Jian Zhang et.al. | 2409.07255 | null |
2024-09-09 | PersonaTalk: Bring Attention to Your Persona in Visual Dubbing | Longhao Zhang et.al. | 2409.05379 | null |
2024-09-09 | KAN-Based Fusion of Dual-Domain for Audio-Driven Facial Landmarks Generation | Hoang-Son Vo-Thanh et.al. | 2409.05330 | link |
2024-09-05 | SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing | Lingyu Xiong et.al. | 2409.03605 | null |
2024-09-05 | SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model | Weipeng Tan et.al. | 2409.03270 | null |
2024-09-04 | PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation | Jun Ling et.al. | 2409.02657 | null |
2024-09-02 | KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding | Zhihao Xu et.al. | 2409.01113 | link |
2024-08-28 | Micro and macro facial expressions by driven animations in realistic Virtual Humans | Rubens Halbig Montanha et.al. | 2408.16110 | null |
2024-08-27 | MegActor- |
Shurong Yang et.al. | 2408.14975 | null |
2024-08-25 | TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation | Jack Saunders et.al. | 2408.13714 | null |
2024-08-23 | G3FA: Geometry-guided GAN for Face Animation | Alireza Javanmardi et.al. | 2408.13049 | null |
2024-08-21 | AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition | Minheng Ni et.al. | 2408.11564 | null |
2024-08-21 | EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face with Mesh Attention | Yihong Lin et.al. | 2408.11518 | null |
2024-08-20 | DEGAS: Detailed Expressions on Full-Body Gaussian Avatars | Zhijing Shao et.al. | 2408.10588 | null |
2024-08-18 | FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model | Ziyu Yao et.al. | 2408.09384 | null |
2024-08-18 | Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation | Xukun Zhou et.al. | 2408.09357 | null |
2024-08-18 | S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis | Dongze Li et.al. | 2408.09347 | null |
2024-08-16 | GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer | Yihong Lin et.al. | 2408.01826 | null |
2024-08-14 | Content and Style Aware Audio-Driven Facial Animation | Qingju Liu et.al. | 2408.07005 | null |
2024-08-12 | DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation | Jisoo Kim et.al. | 2408.06010 | null |
2024-08-10 | High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model | Weizhi Zhong et.al. | 2408.05416 | null |
2024-08-10 | Style-Preserving Lip Sync via Audio-Aware Style Reference | Weizhi Zhong et.al. | 2408.05412 | null |
2024-08-09 | DeepSpeak Dataset v1.0 | Sarah Barrington et.al. | 2408.05366 | null |
2024-08-06 | ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer | Jiazhi Guan et.al. | 2408.03284 | null |
2024-08-03 | Landmark-guided Diffusion Model for High-fidelity and Temporally Coherent Talking Head Generation | Jintao Tan et.al. | 2408.01732 | null |
2024-08-03 | JambaTalk: Speech-Driven 3D Talking Head Generation Based on Hybrid Transformer-Mamba Model | Farzaneh Jafari et.al. | 2408.01627 | null |
2024-08-01 | UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model | Xiangyu Fan et.al. | 2408.00762 | null |
2024-08-01 | Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion | Manuel Kansy et.al. | 2408.00458 | null |
2024-08-01 | EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head | Qianyun He et.al. | 2408.00297 | null |
2024-07-31 | Deformable 3D Shape Diffusion Model | Dengsheng Chen et.al. | 2407.21428 | null |
2024-07-26 | LinguaLinker: Audio-Driven Portraits Animation with Implicit Facial Control Enhancement | Rui Zhang et.al. | 2407.18595 | null |
2024-07-24 | A Comprehensive Review and Taxonomy of Audio-Visual Synchronization Techniques for Realistic Speech Animation | Jose Geraldo Fernandes et.al. | 2407.17430 | null |
2024-07-24 | The impact of differences in facial features between real speakers and 3D face models on synthesized lip motions | Rabab Algadhy et.al. | 2407.17253 | null |
2024-07-22 | PAV: Personalized Head Avatar from Unstructured Video Collection | Akin Caliskan et.al. | 2407.21047 | null |
2024-07-21 | Anchored Diffusion for Video Face Reenactment | Idan Kligvasser et.al. | 2407.15153 | null |
2024-07-20 | Text-based Talking Video Editing with Cascaded Conditional Diffusion | Bo Han et.al. | 2407.14841 | null |
2024-07-17 | Universal Facial Encoding of Codec Avatars from VR Headsets | Shaojie Bai et.al. | 2407.13038 | null |
2024-07-17 | EmoFace: Audio-driven Emotional 3D Face Animation | Chang Liu et.al. | 2407.12501 | link |
2024-07-13 | Learning Online Scale Transformation for Talking Head Video Generation | Fa-Ting Hong et.al. | 2407.09965 | null |
2024-07-12 | Real Face Video Animation Platform | Xiaokai Chen et.al. | 2407.18955 | null |
2024-07-12 | One-Shot Pose-Driving Face Animation Platform | He Feng et.al. | 2407.08949 | null |
2024-07-12 | EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions | Zhiyuan Chen et.al. | 2407.08136 | null |
2024-07-08 | MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices | Jianwen Jiang et.al. | 2407.05712 | null |
2024-07-08 | Audio-driven High-resolution Seamless Talking Head Video Editing via StyleGAN | Jiacheng Su et.al. | 2407.05577 | null |
2024-07-04 | Compressed Skinning for Facial Blendshapes | Ladislav Kavan et.al. | 2406.11597 | null |
2024-07-03 | LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control | Jianzhu Guo et.al. | 2407.03168 | link |
2024-07-01 | Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert | Han EunGi et.al. | 2407.01034 | null |
2024-06-26 | RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network | Xiaozhong Ji et.al. | 2406.18284 | null |
2024-06-24 | The Effects of Embodiment and Personality Expression on Learning in LLM-based Educational Agents | Sinan Sonlu et.al. | 2407.10993 | null |
2024-06-21 | EmpathyEar: An Open-source Avatar Multimodal Empathetic Chatbot | Hao Fei et.al. | 2406.15177 | link |
2024-06-20 | MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset | Kim Sung-Bin et.al. | 2406.14272 | null |
2024-06-19 | DF40: Toward Next-Generation Deepfake Detection | Zhiyuan Yan et.al. | 2406.13495 | null |
2024-06-19 | AniFaceDiff: High-Fidelity Face Reenactment via Facial Parametric Conditioned Diffusion Models | Ken Chen et.al. | 2406.13272 | null |
2024-06-18 | RITA: A Real-time Interactive Talking Avatars Framework | Wuxinlin Cheng et.al. | 2406.13093 | null |
2024-06-18 | A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing | Ming Meng et.al. | 2406.10553 | null |
2024-06-17 | NLDF: Neural Light Dynamic Fields for Efficient 3D Talking Head Generation | Niu Guanchen et.al. | 2406.11259 | null |
2024-06-17 | Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement | Runyi Yu et.al. | 2406.08096 | null |
2024-06-16 | Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation | Mingwang Xu et.al. | 2406.08801 | null |
2024-06-14 | DNPM: A Neural Parametric Model for the Synthesis of Facial Geometric Details | Haitao Cao et.al. | 2405.19688 | null |
2024-06-13 | Talking Heads: Understanding Inter-layer Communication in Transformer Language Models | Jack Merullo et.al. | 2406.09519 | null |
2024-06-13 | DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing | Neha Sahipjohn et.al. | 2406.08802 | null |
2024-06-12 | Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation | Jiadong Liang et.al. | 2406.07895 | null |
2024-06-07 | Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation | Yue Ma et.al. | 2406.01900 | null |
2024-06-05 | Controllable Talking Face Generation by Implicit Facial Keypoints Editing | Dong Zhao et.al. | 2406.02880 | null |
2024-05-31 | MunchSonic: Tracking Fine-grained Dietary Actions through Active Acoustic Sensing on Eyeglasses | Saif Mahmud et.al. | 2405.21004 | null |
2024-05-31 | MegActor: Harness the Power of Raw Video for Vivid Portrait Animation | Shurong Yang et.al. | 2405.20851 | link |
2024-05-30 | Audio2Rig: Artist-oriented deep learning tool for facial animation | Bastien Arcelin et.al. | 2405.20412 | null |
2024-05-28 | OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance | Shuheng Ge et.al. | 2405.14709 | null |
2024-05-24 | InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation | Yuchi Wang et.al. | 2405.15758 | link |
2024-05-22 | Metabook: An Automatically Generated Augmented Reality Storybook Interaction System to Improve Children's Engagement in Storytelling | Yibo Wang et.al. | 2405.13701 | null |
2024-05-21 | Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control | Yue Han et.al. | 2405.12970 | null |
2024-05-16 | Faces that Speak: Jointly Synthesising Talking Face and Speech from Text | Youngjoon Jang et.al. | 2405.10272 | null |
2024-05-14 | PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset | Yang Hou et.al. | 2405.08838 | link |
2024-05-12 | Listen, Disentangle, and Control: Controllable Speech-Driven Talking Head Generation | Changpeng Cai et.al. | 2405.07257 | null |
2024-05-10 | NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior | Gihoon Kim et.al. | 2405.05749 | null |
2024-05-09 | SwapTalk: Audio-Driven Talking Face Generation with One-Shot Customization in Latent Space | Zeren Zhang et.al. | 2405.05636 | null |
2024-05-08 | Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention | Ruijie Tao et.al. | 2404.18501 | null |
2024-05-07 | Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation | Dogucan Yaman et.al. | 2405.04327 | null |
2024-05-06 | AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding | Tao Liu et.al. | 2405.03121 | link |
2024-04-29 | EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars | Nikita Drobyshev et.al. | 2404.19110 | null |
2024-04-29 | GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting | Bo Chen et.al. | 2404.19040 | null |
2024-04-29 | Embedded Representation Learning Network for Animating Styled Video Portrait | Tianyong Wang et.al. | 2404.19038 | null |
2024-04-29 | CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation | Xiangyu Liang et.al. | 2404.18604 | null |
2024-04-28 | GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting | Hongyun Yu et.al. | 2404.14037 | null |
2024-04-25 | GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting | Kyusun Cho et.al. | 2404.16012 | link |
2024-04-23 | TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting | Jiahe Li et.al. | 2404.15264 | null |
2024-04-19 | Learn2Talk: 3D Talking Face Learns from 2D Talking Face | Yixiang Zhuang et.al. | 2404.12888 | null |
2024-04-16 | VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time | Sicheng Xu et.al. | 2404.10667 | null |
2024-04-15 | FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance, Head-pose, and Facial Expression Features | Andre Rochow et.al. | 2404.09736 | null |
2024-04-13 | THQA: A Perceptual Quality Assessment Database for Talking Heads | Yingjie Zhou et.al. | 2404.09003 | link |
2024-04-11 | EFHQ: Multi-purpose ExtremePose-Face-HQ dataset | Trung Tuan Dao et.al. | 2312.17205 | null |
2024-04-09 | Deepfake Generation and Detection: A Benchmark and Survey | Gan Pei et.al. | 2403.17881 | link |
2024-04-08 | SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation | Heyuan Li et.al. | 2404.05680 | null |
2024-04-07 | GvT: A Graph-based Vision Transformer with Talking-Heads Utilizing Sparsity, Trained from Scratch on Small Datasets | Dongjing Shan et.al. | 2404.04924 | null |
2024-04-07 | Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation | Renshuai Liu et.al. | 2401.01207 | null |
2024-04-03 | MI-NeRF: Learning a Single Face NeRF from Multiple Identities | Aggelina Chatziagapi et.al. | 2403.19920 | null |
2024-04-02 | EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis | Shuai Tan et.al. | 2404.01647 | null |
2024-04-02 | Learning to Generate Conditional Tri-plane for 3D-aware Expression Controllable Portrait Animation | Taekyung Ki et.al. | 2404.00636 | null |
2024-04-01 | FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio | Chao Xu et.al. | 2403.01901 | link |
2024-04-01 | Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation | Se Jin Park et.al. | 2305.19556 | null |
2024-03-29 | Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior | Jaehoon Ko et.al. | 2403.20153 | link |
2024-03-28 | MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation | Seyeon Kim et.al. | 2403.19144 | link |
2024-03-28 | GOTCHA: Real-Time Video Deepfake Detection via Challenge-Response | Govind Mittal et.al. | 2210.06186 | link |
2024-03-27 | X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention | You Xie et.al. | 2403.15931 | null |
2024-03-26 | Superior and Pragmatic Talking Face Generation with Teacher-Student Framework | Chao Liang et.al. | 2403.17883 | null |
2024-03-26 | AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation | Huawei Wei et.al. | 2403.17694 | link |
2024-03-25 | DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment | Stella Bounareli et.al. | 2403.17217 | null |
2024-03-25 | AnimateMe: 4D Facial Expressions via Diffusion Models | Dimitrios Gerogiannis et.al. | 2403.17213 | null |
2024-03-25 | Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework | Ziyao Huang et.al. | 2403.16510 | link |
2024-03-23 | Adaptive Super Resolution For One-Shot Talking-Head Generation | Luchuan Song et.al. | 2403.15944 | link |
2024-03-23 | Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis | Zhenhui Ye et.al. | 2401.08503 | link |
2024-03-22 | LeGO: Leveraging a Surface Deformation Network for Animatable Stylized Face Generation with One Example | Soyeon Yoon et.al. | 2403.15227 | link |
2024-03-22 | Virbo: Multimodal Multilingual Avatar Video Generation in Digital Marketing | Juan Zhang et.al. | 2403.11700 | null |
2024-03-19 | EmoVOCA: Speech-Driven Emotional 3D Talking Heads | Federico Nocentini et.al. | 2403.12886 | null |
2024-03-19 | ScanTalk: 3D Talking Heads from Unregistered Scans | Federico Nocentini et.al. | 2403.10942 | null |
2024-03-15 | StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation | Dongchan Min et.al. | 2208.10922 | null |
2024-03-14 | GAIA: Zero-shot Talking Avatar Generation | Tianyu He et.al. | 2311.15230 | null |
2024-03-13 | Say Anything with Any Style | Shuai Tan et.al. | 2403.06363 | null |
2024-03-12 | FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization | Shuai Tan et.al. | 2403.06375 | null |
2024-03-12 | Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style | Shuai Tan et.al. | 2403.06365 | null |
2024-03-11 | A Comparative Study of Perceptual Quality Metrics for Audio-driven Talking Head Videos | Weixia Zhang et.al. | 2403.06421 | link |
2024-03-05 | Memories are One-to-Many Mapping Alleviators in Talking Face Generation | Anni Tang et.al. | 2212.05005 | null |
2024-03-02 | G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment | Juan Zhang et.al. | 2402.18122 | null |
2024-03-01 | DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder | Chenpeng Du et.al. | 2303.17550 | null |
2024-02-29 | Learning a Generalized Physical Face Model From Data | Lingchen Yang et.al. | 2402.19477 | null |
2024-02-28 | Context-aware Talking Face Video Generation | Meidai Xuanyuan et.al. | 2402.18092 | null |
2024-02-27 | EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions | Linrui Tian et.al. | 2402.17485 | null |
2024-02-27 | Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis | Zicheng Zhang et.al. | 2402.17364 | link |
2024-02-26 | Resolution-Agnostic Neural Compression for High-Fidelity Portrait Video Conferencing via Implicit Radiance Fields | Yifei Li et.al. | 2402.16599 | null |
2024-02-25 | AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation | Yasheng Sun et.al. | 2402.16124 | null |
2024-02-21 | Bring Your Own Character: A Holistic Solution for Automatic Facial Animation Generation of Customized Characters | Zechen Bai et.al. | 2402.13724 | link |
2024-02-21 | StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing | Gaoxiang Cong et.al. | 2402.12636 | null |
2024-02-12 | StyleLipSync: Style-based Personalized Lip-sync Video Generation | Taekyung Ki et.al. | 2305.00521 | null |
2024-02-08 | DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer | Zhiyuan Ma et.al. | 2402.05712 | link |
2024-02-05 | One-shot Neural Face Reenactment via Finding Directions in GAN's Latent Space | Stella Bounareli et.al. | 2402.03553 | null |
2024-02-02 | EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation | Guanwen Feng et.al. | 2402.01422 | null |
2024-01-31 | MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis | Wenhao Guan et.al. | 2312.10687 | null |
2024-01-30 | Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance | Qingcheng Zhao et.al. | 2401.15687 | null |
2024-01-28 | Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-Syncing DeepFakes | Weifeng Liu et.al. | 2401.15668 | link |
2024-01-27 | An Implicit Physical Face Model Driven by Expression and Style | Lingchen Yang et.al. | 2401.15414 | null |
2024-01-26 | Implicit Neural Representation for Physics-driven Actuated Soft Bodies | Lingchen Yang et.al. | 2401.14861 | null |
2024-01-25 | SAiD: Speech-driven Blendshape Facial Animation with Diffusion | Inkyu Park et.al. | 2401.08655 | link |
2024-01-23 | NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for Talking Face Synthesis | Chongke Bi et.al. | 2401.12568 | null |
2024-01-19 | Fast Registration of Photorealistic Avatars for VR Facial Animation | Chaitanya Patel et.al. | 2401.11002 | null |
2024-01-18 | Exposing Lip-syncing Deepfakes from Mouth Inconsistencies | Soumyya Kanti Datta et.al. | 2401.10113 | null |
2024-01-18 | Text-driven Talking Face Synthesis by Reprogramming Audio-driven Models | Jeongsoo Choi et.al. | 2306.16003 | null |
2024-01-16 | EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model | Bingyuan Zhang et.al. | 2401.08049 | null |
2024-01-12 | DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder | Tao Liu et.al. | 2311.01811 | null |
2024-01-11 | Dubbing for Everyone: Data-Efficient Visual Dubbing using Neural Rendering Priors | Jack Saunders et.al. | 2401.06126 | null |
2024-01-11 | Jump Cut Smoothing for Talking Heads | Xiaojuan Wang et.al. | 2401.04718 | null |
2024-01-08 | AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation | Liyang Chen et.al. | 2310.07236 | null |
2024-01-07 | Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness | Sicheng Yang et.al. | 2401.03476 | null |
2024-01-04 | Expressive Speech-driven Facial Animation with controllable emotions | Yutong Chen et.al. | 2301.02008 | link |
2023-12-23 | TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation | Xize Cheng et.al. | 2312.15197 | null |
2023-12-21 | DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation | Chenxu Zhang et.al. | 2312.13578 | null |
2023-12-20 | FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability | Linze Li et.al. | 2312.03775 | null |
2023-12-19 | Learning Dense Correspondence for NeRF-Based Face Reenactment | Songlin Yang et.al. | 2312.10422 | null |
2023-12-19 | Gaussian3Diff: 3D Gaussian Diffusion for 3D Full Head Synthesis and Editing | Yushi Lan et.al. | 2312.03763 | null |
2023-12-18 | VectorTalker: SVG Talking Face Generation with Progressive Vectorisation | Hao Hu et.al. | 2312.11568 | null |
2023-12-18 | AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head Synthesis | Dongze Li et.al. | 2312.10921 | null |
2023-12-18 | Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation | Hui Fu et.al. | 2312.10877 | null |
2023-12-15 | DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models | Yifeng Ma et.al. | 2312.09767 | null |
2023-12-15 | Attention-Based VR Facial Animation with Visual Mouth Camera Guidance for Immersive Telepresence Avatars | Andre Rochow et.al. | 2312.09750 | null |
2023-12-13 | uTalk: Bridging the Gap Between Humans and AI | Hussam Azzuni et.al. | 2310.02739 | null |
2023-12-13 | MMFace4D: A Large-Scale Multi-Modal 4D Face Dataset for Audio-Driven 3D Face Animation | Haozhe Wu et.al. | 2303.09797 | null |
2023-12-12 | GMTalker: Gaussian Mixture based Emotional talking video Portraits | Yibo Xia et.al. | 2312.07669 | null |
2023-12-12 | GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance | Haiming Zhang et.al. | 2312.07385 | null |
2023-12-11 | Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism | Georgios Milis et.al. | 2312.06613 | link |
2023-12-11 | Study of Non-Verbal Behavior in Conversational Agents | Camila Vicari Maccari et.al. | 2312.06530 | null |
2023-12-11 | DiT-Head: High-Resolution Talking Head Synthesis using Diffusion Transformers | Aaron Mir et.al. | 2312.06400 | null |
2023-12-11 | Audio-driven Talking Face Generation by Overcoming Unintended Information Flow | Dogucan Yaman et.al. | 2307.09368 | null |
2023-12-10 | DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation | Fa-Ting Hong et.al. | 2305.06225 | link |
2023-12-09 | R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning | Zhiling Ye et.al. | 2312.05572 | null |
2023-12-09 | FT2TF: First-Person Statement Text-To-Talking Face Generation | Xingjian Diao et.al. | 2312.05430 | null |
2023-12-08 | SingingHead: A Large-scale 4D Dataset for Singing Head Animation | Sijing Wu et.al. | 2312.04369 | null |
2023-12-07 | VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior | Xusen Sun et.al. | 2312.01841 | null |
2023-12-05 | PMMTalk: Speech-Driven 3D Facial Animation from Complementary Pseudo Multi-modal Features | Tianshun Han et.al. | 2312.02781 | null |
2023-12-05 | MyPortrait: Morphable Prior-Guided Personalized Portrait Generation | Bo Ding et.al. | 2312.02703 | null |
2023-12-02 | DiffusionTalker: Personalization and Acceleration for Speech-Driven 3D Face Diffuser | Peng Chen et.al. | 2311.16565 | null |
2023-12-01 | 3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing | Balamurugan Thambiraja et.al. | 2312.00870 | null |
2023-11-30 | Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data | Yu Deng et.al. | 2311.18729 | null |
2023-11-30 | Talking Head(?) Anime from a Single Image 4: Improved Model and Its Distillation | Pramook Khungurn et.al. | 2311.17409 | null |
2023-11-29 | SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis | Ziqiao Peng et.al. | 2311.17590 | link |
2023-11-28 | THInImg: Cross-modal Steganography for Presenting Talking Heads in Images | Lin Zhao et.al. | 2311.17177 | null |
2023-11-28 | BakedAvatar: Baking Neural Fields for Real-Time Head Avatar Synthesis | Hao-Bin Duan et.al. | 2311.05521 | link |
2023-11-28 | Continuously Controllable Facial Expression Editing in Talking Face Videos | Zhiyao Sun et.al. | 2209.08289 | null |
2023-11-20 | MemoryCompanion: A Smart Healthcare Solution to Empower Efficient Alzheimer's Care Via Unleashing Generative AI | Lifei Zheng et.al. | 2311.14730 | null |
2023-11-15 | CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding | Jianzong Wang et.al. | 2311.08673 | null |
2023-11-13 | DualTalker: A Cross-Modal Dual Learning Approach for Speech-Driven 3D Facial Animation | Guinan Su et.al. | 2311.04766 | null |
2023-11-12 | ChatAnything: Facetime Chat with LLM-Enhanced Personas | Yilin Zhao et.al. | 2311.06772 | null |
2023-11-08 | Synthetic Speaking Children -- Why We Need Them and How to Make Them | Muhammad Ali Farooq et.al. | 2311.06307 | null |
2023-11-06 | RADIO: Reference-Agnostic Dubbing Video Synthesis | Dongyeun Lee et.al. | 2309.01950 | null |
2023-11-05 | 3D-Aware Talking-Head Video Motion Transfer | Haomiao Ni et.al. | 2311.02549 | null |
2023-11-03 | Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading | Songtao Luo et.al. | 2310.05058 | link |
2023-11-02 | LaughTalk: Expressive 3D Talking Head Generation with Laughter | Kim Sung-Bin et.al. | 2311.00994 | null |
2023-11-02 | High-Fidelity and Freely Controllable Talking Head Video Generation | Yue Gao et.al. | 2304.10168 | null |
2023-10-31 | Breathing Life into Faces: Speech-driven 3D Facial Animation with Natural Head Pose and Detailed Shape | Wei Zhao et.al. | 2310.20240 | null |
2023-10-29 | On the Vulnerability of DeepFake Detectors to Attacks Generated by Denoising Diffusion Models | Marija Ivanovska et.al. | 2307.05397 | null |
2023-10-25 | Personalized Speech-driven Expressive 3D Facial Animation Synthesis with Style Control | Elif Bozkurt et.al. | 2310.17011 | null |
2023-10-23 | The Self 2.0: How AI-Enhanced Self-Clones Transform Self-Perception and Improve Presentation Skills | Qingxiao Zheng et.al. | 2310.15112 | null |
2023-10-19 | Gemino: Practical and Robust Neural Compression for Video Conferencing | Vibhaalakshmi Sivaraman et.al. | 2209.10507 | null |
2023-10-17 | CorrTalk: Correlation Between Hierarchical Speech and Facial Activity Variances for 3D Animation | Zhaojie Chu et.al. | 2310.11295 | null |
2023-10-15 | HyperLips: Hyper Control Lips with High Resolution Decoder for Talking Face Generation | Yaosen Chen et.al. | 2310.05720 | link |
2023-10-12 | CleftGAN: Adapting A Style-Based Generative Adversarial Network To Create Images Depicting Cleft Lip Deformity | Abdullah Hayajneh et.al. | 2310.07969 | link |
2023-10-12 | Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation | Yuan Gan et.al. | 2309.04946 | link |
2023-10-08 | GestSync: Determining who is speaking without a talking head | Sindhu B Hegde et.al. | 2310.05304 | link |
2023-09-30 | DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models | Zhiyao Sun et.al. | 2310.00434 | null |
2023-09-28 | OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions | Jin Liu et.al. | 2309.16148 | null |
2023-09-26 | Emotional Speech-Driven Animation with Content-Emotion Disentanglement | Radek DanΔΔek et.al. | 2306.08990 | null |
2023-09-20 | FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion | Stefan Stan et.al. | 2309.11306 | link |
2023-09-20 | Context-Aware Talking-Head Video Editing | Songlin Yang et.al. | 2308.00462 | null |
2023-09-18 | That's What I Said: Fully-Controllable Talking Face Generation | Youngjoon Jang et.al. | 2304.03275 | null |
2023-09-15 | Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech | Junjie Li et.al. | 2309.08408 | link |
2023-09-14 | DT-NeRF: Decomposed Triplane-Hash Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis | Yaoyu Su et.al. | 2309.07752 | null |
2023-09-14 | DiffTalker: Co-driven audio-image diffusion for talking faces via intermediate landmarks | Zipeng Qi et.al. | 2309.07509 | null |
2023-09-14 | HDTR-Net: A Real-Time High-Definition Teeth Restoration Network for Arbitrary Talking Face Generation Methods | Yongyuan Li et.al. | 2309.07495 | link |
2023-09-13 | PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network | Qinghua Liu et.al. | 2309.06723 | null |
2023-09-12 | DF-TransFusion: Multimodal Deepfake Detection via Lip-Audio Cross-Attention and Facial Self-Attention | Aaditya Kharel et.al. | 2309.06511 | null |
2023-09-12 | Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos | Ekta Prashnani et.al. | 2305.03713 | null |
2023-09-11 | ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment | Yicheng Zhong et.al. | 2308.14448 | null |
2023-09-10 | MaskRenderer: 3D-Infused Multi-Mask Realistic Face Reenactment | Tina Behrouzi et.al. | 2309.05095 | null |
2023-09-09 | Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video | Xiuzhe Wu et.al. | 2309.04814 | link |
2023-09-01 | Unsupervised Learning of Style-Aware Facial Animation from Real Acting Performances | Wolfgang Paier et.al. | 2306.10006 | null |
2023-08-30 | From Pixels to Portraits: A Comprehensive Survey of Talking Head Generation Techniques and Applications | Shreyank N Gowda et.al. | 2308.16041 | null |
2023-08-30 | SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces | Ziqiao Peng et.al. | 2306.10799 | link |
2023-08-30 | Laughing Matters: Introducing Laughing-Face Generation using Diffusion Models | Antoni Bigata Casademunt et.al. | 2305.08854 | link |
2023-08-29 | Papeos: Augmenting Research Papers with Talk Videos | Tae Soo Kim et.al. | 2308.15224 | null |
2023-08-25 | EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation | Ziqiao Peng et.al. | 2303.11089 | link |
2023-08-24 | ToonTalker: Cross-Domain Face Reenactment | Yuan Gong et.al. | 2308.12866 | null |
2023-08-24 | Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis | Jiahe Li et.al. | 2307.09323 | link |
2023-08-23 | DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion | Se Jin Park et.al. | 2310.05934 | null |
2023-08-21 | Deep Person Generation: A Survey from the Perspective of Face, Pose and Cloth Synthesis | Tong Sha et.al. | 2109.02081 | null |
2023-08-18 | Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization | Soumik Mukhopadhyay et.al. | 2308.09716 | link |
2023-08-18 | Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation | Fa-Ting Hong et.al. | 2307.09906 | link |
2023-08-17 | A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation | Li Liu et.al. | 2308.08849 | link |
2023-08-16 | Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with Instructions | Yuqi Sun et.al. | 2306.10813 | null |
2023-08-12 | Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation | Zhichao Wang et.al. | 2308.06457 | link |
2023-08-12 | DialogueNeRF: Towards Realistic Avatar Face-to-Face Conversation Video Generation | Yichao Yan et.al. | 2203.07931 | null |
2023-08-11 | Versatile Face Animator: Driving Arbitrary 3D Facial Avatar in RGBD Space | Haoyu Wang et.al. | 2308.06076 | link |
2023-08-11 | VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer | Liyang Chen et.al. | 2308.04830 | null |
2023-08-10 | Near-realtime Facial Animation by Deep 3D Simulation Super-Resolution | Hyojoon Park et.al. | 2305.03216 | null |
2023-08-02 | Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis | Zhenhui Ye et.al. | 2306.03504 | null |
2023-07-29 | Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation | MichaΕ StypuΕkowski et.al. | 2301.03396 | null |
2023-07-26 | Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation | Federico Nocentini et.al. | 2306.01415 | link |
2023-07-20 | HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces | Stella Bounareli et.al. | 2307.10797 | link |
2023-07-19 | MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions | Yunfei Liu et.al. | 2307.10008 | null |
2023-07-19 | Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline | Zhigang Chang et.al. | 2307.09821 | null |
2023-07-19 | OPHAvatars: One-shot Photo-realistic Head Avatars | Shaoxu Li et.al. | 2307.09153 | link |
2023-07-18 | FACTS: Facial Animation Creation using the Transfer of Styles | Jack Saunders et.al. | 2307.09480 | null |
2023-07-09 | Predictive Coding For Animation-Based Video Compression | Goluck Konuko et.al. | 2307.04187 | null |
2023-07-08 | FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction | Ganglai Wang et.al. | 2307.03990 | null |
2023-07-05 | Interactive Conversational Head Generation | Mohan Zhou et.al. | 2307.02090 | null |
2023-07-04 | A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation | Louis Airale et.al. | 2307.03270 | link |
2023-07-04 | Generating Animatable 3D Cartoon Faces from Single Portraits | Chuanyu Pan et.al. | 2307.01468 | null |
2023-07-03 | RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting Self-Supervised Representations | Neha Sahipjohn et.al. | 2307.01233 | null |
2023-06-20 | Audio-Driven 3D Facial Animation from In-the-Wild Videos | Liying Lu et.al. | 2306.11541 | null |
2023-06-13 | Parametric Implicit Face Representation for Audio-Driven Facial Reenactment | Ricong Huang et.al. | 2306.07579 | null |
2023-06-13 | AniFaceDrawing: Anime Portrait Exploration during Your Sketching | Zhengyu Huang et.al. | 2306.07476 | null |
2023-06-12 | NPVForensics: Jointing Non-critical Phonemes and Visemes for Deepfake Detection | Yu Chen et.al. | 2306.06885 | null |
2023-06-10 | StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles | Yifeng Ma et.al. | 2301.01081 | link |
2023-06-08 | ReliableSwap: Boosting General Face Swapping Via Reliable Supervision | Ge Yuan et.al. | 2306.05356 | link |
2023-06-06 | Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks | Jianrong Wang et.al. | 2306.03594 | null |
2023-06-05 | Instruct-Video2Avatar: Video-to-Avatar Generation with Instructions | Shaoxu Li et.al. | 2306.02903 | link |
2023-05-31 | High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning | Chao Xu et.al. | 2305.02572 | null |
2023-05-23 | CPNet: Exploiting CLIP-based Attention Condenser and Probability Map Guidance for High-fidelity Talking Face Generation | Jingning Xu et.al. | 2305.13962 | null |
2023-05-22 | RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars | Dongwei Pan et.al. | 2305.13353 | link |
2023-05-19 | UniFLG: Unified Facial Landmark Generator from Text or Speech | Kentaro Mitsui et.al. | 2302.14337 | null |
2023-05-18 | An Android Robot Head as Embodied Conversational Agent | Marcel Heisler et.al. | 2305.10945 | null |
2023-05-18 | Audio-Visual Person-of-Interest DeepFake Detection | Davide Cozzolino et.al. | 2204.03083 | link |
2023-05-17 | INCLG: Inpainting for Non-Cleft Lip Generation with a Multi-Task Image Processing Network | Shuang Chen et.al. | 2305.10589 | null |
2023-05-17 | LPMM: Intuitive Pose Control for Neural Talking-Head Model via Landmark-Parameter Morphable Model | Kwangho Lee et.al. | 2305.10456 | null |
2023-05-15 | Identity-Preserving Talking Face Generation with Landmark and Appearance Priors | Weizhi Zhong et.al. | 2305.08293 | link |
2023-05-09 | Zero-shot personalized lip-to-speech synthesis with face image based voice control | Zheng-Yan Sheng et.al. | 2305.14359 | null |
2023-05-09 | StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator | Jiazhi Guan et.al. | 2305.05445 | null |
2023-05-09 | Multimodal-driven Talking Face Generation via a Unified Diffusion-based Generator | Chao Xu et.al. | 2305.02594 | null |
2023-05-01 | StyleAvatar: Real-time Photo-realistic Portrait Avatar from a Single Video | Lizhen Wang et.al. | 2305.00942 | link |
2023-05-01 | GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation | Zhenhui Ye et.al. | 2305.00787 | null |
2023-04-28 | A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation | Bo-Kyeong Kim et.al. | 2304.00471 | null |
2023-04-27 | Controllable One-Shot Face Video Synthesis With Semantic Aware Prior | Kangning Liu et.al. | 2304.14471 | null |
2023-04-25 | AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head | Rongjie Huang et.al. | 2304.12995 | link |
2023-04-24 | VR Facial Animation for Immersive Telepresence Avatars | Andre Rochow et.al. | 2304.12051 | null |
2023-04-21 | Implicit Neural Head Synthesis via Controllable Local Deformation Fields | Chuhan Chen et.al. | 2304.11113 | null |
2023-04-20 | DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation | Shuai Shen et.al. | 2301.03786 | link |
2023-04-18 | Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations | Rongliang Wu et.al. | 2304.08945 | null |
2023-04-17 | Autoregressive GAN for Semantic Unconditional Head Motion Generation | Louis Airale et.al. | 2211.00987 | link |
2023-04-11 | One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field | Weichuang Li et.al. | 2304.05097 | null |
2023-04-06 | Face Animation with an Attribute-Guided Diffusion Model | Bohan Zeng et.al. | 2304.03199 | link |
2023-04-06 | 4D Agnostic Real-Time Facial Animation Pipeline for Desktop Scenarios | Wei Chen et.al. | 2304.02814 | null |
2023-04-03 | CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior | Jinbo Xing et.al. | 2301.02379 | link |
2023-04-01 | DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance | Longwen Zhang et.al. | 2304.03117 | null |
2023-04-01 | TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles | Yifeng Ma et.al. | 2304.00334 | null |
2023-03-31 | FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions | Jin Liu et.al. | 2303.17789 | null |
2023-03-29 | Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert | Jiadong Wang et.al. | 2303.17480 | link |
2023-03-27 | OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis | Hongyi Xu et.al. | 2303.15539 | null |
2023-03-27 | Accurate and Interpretable Solution of the Inverse Rig for Realistic Blendshape Models with Quadratic Corrective Terms | Stevo RackoviΔ et.al. | 2302.04843 | null |
2023-03-27 | MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation | Bowen Zhang et.al. | 2212.08062 | link |
2023-03-27 | A Majorization-Minimization Based Method for Nonconvex Inverse Rig Problems in Facial Animation: Algorithm Derivation | Stevo RackoviΔ et.al. | 2205.04289 | null |
2023-03-26 | OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering | Zhiyuan Ma et.al. | 2303.14662 | link |
2023-03-26 | Emotionally Enhanced Talking Face Generation | Sahil Goyal et.al. | 2303.11548 | link |
2023-03-26 | Distributed Solution of the Inverse Rig Problem in Blendshape Facial Animation | Stevo RackoviΔ et.al. | 2303.06370 | null |
2023-03-24 | Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement | Siddarth Ravichandran et.al. | 2209.01320 | null |
2023-03-23 | PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360 |
Sizhe An et.al. | 2303.13071 | null |
2023-03-22 | Style Transfer for 2D Talking Head Animation | Trong-Thang Pham et.al. | 2303.09799 | link |
2023-03-22 | MARLIN: Masked Autoencoder for facial video Representation LearnINg | Zhixi Cai et.al. | 2211.06627 | link |
2023-03-14 | DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions | Geumbyeol Hwang et.al. | 2303.07697 | link |
2023-03-13 | SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation | Wenxuan Zhang et.al. | 2211.12194 | link |
2023-03-09 | FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning | Kazi Injamamul Haque et.al. | 2303.05416 | link |
2023-03-09 | Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation | Qi Chen et.al. | 2303.05322 | link |
2023-03-07 | DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video | Zhimeng Zhang et.al. | 2303.03988 | link |
2023-03-05 | Cyber Vaccine for Deepfake Immunity | Ching-Chun Chang et.al. | 2303.02659 | null |
2023-03-04 | High-fidelity Facial Avatar Reconstruction from Monocular Video with Generative Priors | Yunpeng Bai et.al. | 2211.15064 | null |
2023-03-01 | DPE: Disentanglement of Pose and Expression for General Video Portrait Editing | Youxin Pang et.al. | 2301.06281 | link |
2023-02-27 | Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video | Minsu Kim et.al. | 2303.08670 | null |
2023-02-27 | Memory-augmented Contrastive Learning for Talking Head Generation | Jianrong Wang et.al. | 2302.13469 | link |
2023-02-24 | Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertex Attention | Bin Liu et.al. | 2302.12532 | null |
2023-02-16 | OPT: One-shot Pose-Controllable Talking Head Generation | Jin Liu et.al. | 2302.08197 | null |
2023-02-14 | Expressive Talking Head Video Encoding in StyleGAN2 Latent-Space | Trevine Oorloff et.al. | 2203.14512 | link |
2023-01-31 | GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis | Zhenhui Ye et.al. | 2301.13430 | null |
2023-01-23 | Data standardization for robust lip sync | Chun Wang et.al. | 2202.06198 | null |
2023-01-20 | Neural Volumetric Blendshapes: Computationally Efficient Physics-Based Facial Blendshapes | Nicolas Wagner et.al. | 2212.14784 | null |
2023-01-15 | Learning Audio-Driven Viseme Dynamics for 3D Face Animation | Linchao Bao et.al. | 2301.06059 | null |
2022-12-30 | Imitator: Personalized Speech-driven 3D Facial Animation | Balamurugan Thambiraja et.al. | 2301.00023 | null |
2022-12-28 | All's well that FID's well? Result quality and metric scores in GAN models for lip-sychronization tasks | Carina Geldhauser et.al. | 2212.13810 | null |
2022-12-23 | Dubbing in Practice: A Large Scale Study of Human Localization With Insights for Automatic Dubbing | William Brannon et.al. | 2212.12137 | null |
2022-12-09 | Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers | Yasheng Sun et.al. | 2212.04970 | null |
2022-12-07 | Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors | Zhentao Yu et.al. | 2212.04248 | null |
2022-12-07 | SPACE: Speech-driven Portrait Animation with Controllable Expression | Siddharth Gururani et.al. | 2211.09809 | null |
2022-11-30 | Extracting Semantic Knowledge from GANs with Unsupervised Learning | Jianjin Xu et.al. | 2211.16710 | null |
2022-11-27 | VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild | Kun Cheng et.al. | 2211.14758 | null |
2022-11-26 | Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis | Duomin Wang et.al. | 2211.14506 | link |
2022-11-22 | Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition | Jiaxiang Tang et.al. | 2211.12368 | null |
2022-11-10 | On the role of Lip Articulation in Visual Speech Perception | Zakaria Aldeneh et.al. | 2203.10117 | null |
2022-11-03 | SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory | Se Jin Park et.al. | 2211.00924 | null |
2022-10-21 | Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection | Alexandros Haliassos et.al. | 2201.07131 | link |
2022-10-13 | Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors | Vladimir Iashin et.al. | 2210.07055 | link |
2022-10-13 | Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar | Aolan Sun et.al. | 2210.06877 | null |
2022-10-07 | Compressing Video Calls using Synthetic Talking Heads | Madhav Agarwal et.al. | 2210.03692 | null |
2022-10-07 | A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis | Yichen Han et.al. | 2210.03335 | null |
2022-10-06 | Audio-Visual Face Reenactment | Madhav Agarwal et.al. | 2210.02755 | link |
2022-10-06 | Finding Directions in GAN's Latent Space for Neural Face Reenactment | Stella Bounareli et.al. | 2202.00046 | link |
2022-10-04 | Towards MOOCs for Lipreading: Using Synthetic Talking Heads to Train Humans in Lipreading at Scale | Aditya Agarwal et.al. | 2208.09796 | null |
2022-09-29 | Facial Landmark Predictions with Applications to Metaverse | Qiao Han et.al. | 2209.14698 | link |
2022-09-27 | StyleMask: Disentangling the Style Space of StyleGAN2 for Neural Face Reenactment | Stella Bounareli et.al. | 2209.13375 | link |
2022-09-23 | EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model | Xinya Ji et.al. | 2205.15278 | null |
2022-09-21 | FNeVR: Neural Volume Rendering for Face Animation | Bohan Zeng et.al. | 2209.10340 | link |
2022-09-19 | AutoLV: Automatic Lecture Video Generator | Wenbin Wang et.al. | 2209.08795 | null |
2022-09-09 | Talking Head from Speech Audio using a Pre-trained Image Generator | Mohammed M. Alghamdi et.al. | 2209.04252 | null |
2022-09-07 | Restructurable Activation Networks | Kartikeya Bhardwaj et.al. | 2208.08562 | link |
2022-08-29 | StableFace: Analyzing and Improving Motion Stability for Talking Face Generation | Jun Ling et.al. | 2208.13717 | null |
2022-08-17 | Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors | Sindhu B Hegde et.al. | 2208.08118 | link |
2022-08-03 | Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control | Michail Christos Doukas et.al. | 2208.02210 | null |
2022-08-02 | Perceptual Conversational Head Generation with Regularized Driver and Enhanced Renderer | Ailin Huang et.al. | 2206.12837 | link |
2022-08-01 | A Feasibility Study on Image Inpainting for Non-cleft Lip Generation from Patients with Cleft Lip | Shuang Chen et.al. | 2208.01149 | link |
2022-07-27 | A Hybrid Deep Animation Codec for Low-bitrate Video Conferencing | Goluck Konuko et.al. | 2207.13530 | null |
2022-07-24 | Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis | Shuai Shen et.al. | 2207.11770 | link |
2022-07-22 | Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos | Panagiotis P. Filntisis et.al. | 2207.11094 | link |
2022-07-20 | NARRATE: A Normal Assisted Free-View Portrait Stylizer | Youjia Wang et.al. | 2207.00974 | null |
2022-07-20 | VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection | Joanna Hong et.al. | 2206.07458 | null |
2022-07-20 | Responsive Listening Head Generation: A Benchmark Dataset and Baseline | Mohan Zhou et.al. | 2112.13548 | null |
2022-07-13 | FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis | Yongqi Wang et.al. | 2207.03800 | null |
2022-06-29 | Cut Inner Layers: A Structured Pruning Strategy for Efficient U-Net GANs | Bo-Kyeong Kim et.al. | 2206.14658 | null |
2022-06-09 | Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos | Alexander Waibel et.al. | 2206.04523 | null |
2022-05-31 | Text/Speech-Driven Full-Body Animation | Wenlin Zhuang et.al. | 2205.15573 | null |
2022-05-27 | Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast | Boqing Zhu et.al. | 2204.14057 | link |
2022-05-26 | One-Shot Face Reenactment on Megapixels | Wonjun Kang et.al. | 2205.13368 | null |
2022-05-24 | Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts | Debjoy Saha et.al. | 2205.12194 | link |
2022-05-20 | MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement | Alexander Richard et.al. | 2104.08223 | link |
2022-05-13 | Talking Face Generation with Multilingual TTS | Hyoung-Kyu Song et.al. | 2205.06421 | null |
2022-05-02 | Emotion-Controllable Generalized Talking Face Generation | Sanjana Sinha et.al. | 2205.01155 | null |
2022-05-02 | A Novel Speech-Driven Lip-Sync Model with CNN and LSTM | Xiaohong Li et.al. | 2205.00916 | null |
2022-04-27 | Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion | Sen Chen et.al. | 2204.12756 | null |
2022-04-25 | Fast Facial Landmark Detection and Applications: A Survey | Kostiantyn Khabarlak et.al. | 2101.10808 | null |
2022-04-13 | Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions | Zipeng Ye et.al. | 2204.06180 | null |
2022-04-06 | Transformer-S2A: Robust and Efficient Speech-to-Animation | Liyang Chen et.al. | 2111.09771 | null |
2022-04-03 | Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text | Pulkit Tandon et.al. | 2106.14014 | link |
2022-03-30 | End to End Lip Synchronization with a Temporal AutoEncoder | Yoav Shalev et.al. | 2203.16224 | link |
2022-03-29 | Thin-Plate Spline Motion Model for Image Animation | Jian Zhao et.al. | 2203.14367 | link |
2022-03-17 | StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN | Fei Yin et.al. | 2203.04036 | link |
2022-03-17 | FaceFormer: Speech-Driven 3D Facial Animation with Transformers | Yingruo Fan et.al. | 2112.05329 | link |
2022-03-16 | Efficient conditioned face animation using frontally-viewed embedding | Maxime Oquab et.al. | 2203.08765 | null |
2022-03-15 | Depth-Aware Generative Adversarial Network for Talking Head Video Generation | Fa-Ting Hong et.al. | 2203.06605 | link |
2022-03-10 | An Audio-Visual Attention Based Multimodal Network for Fake Talking Face Videos Detection | Ganglai Wang et.al. | 2203.05178 | null |
2022-03-08 | Attention-Based Lip Audio-Visual Synthesis for Talking Face Generation in the Wild | Ganglai Wang et.al. | 2203.03984 | null |
2022-03-04 | Multi-modality Deep Restoration of Extremely Compressed Face Videos | Xi Zhang et.al. | 2107.05548 | null |
2022-03-01 | FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset | Hasam Khalid et.al. | 2108.05080 | link |
2022-02-25 | FSGANv2: Improved Subject Agnostic Face Swapping and Reenactment | Yuval Nirkin et.al. | 2202.12972 | null |
2022-02-22 | Thinking the Fusion Strategy of Multi-reference Face Reenactment | Takuya Yashima et.al. | 2202.10758 | null |
2022-01-24 | Selective Listening by Synchronizing Speech with Lips | Zexu Pan et.al. | 2106.07150 | link |
2022-01-22 | Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme-Pose Dictionary | Sibo Zhang et.al. | 2104.14631 | null |
2022-01-21 | Stitch it in Time: GAN-Based Facial Editing of Real Videos | Rotem Tzaban et.al. | 2201.08361 | link |
2022-01-17 | Towards Realistic Visual Dubbing with Heterogeneous Sources | Tianyi Xie et.al. | 2201.06260 | null |
2022-01-16 | Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels | Zipeng Ye et.al. | 2201.05986 | null |
2022-01-03 | DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering | Shunyu Yao et.al. | 2201.00791 | null |
2021-12-20 | Parallel and High-Fidelity Text-to-Lip Generation | Jinglin Liu et.al. | 2107.06831 | link |
2021-12-19 | Initiative Defense against Facial Manipulation | Qidong Huang et.al. | 2112.10098 | link |
2021-12-07 | Joint Audio-Text Model for Expressive Speech-Driven 3D Facial Animation | Yingruo Fan et.al. | 2112.02214 | null |
2021-12-06 | One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning | Suzhen Wang et.al. | 2112.02749 | null |
2021-11-29 | Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates | Shenhan Qian et.al. | 2108.08020 | link |
2021-11-04 | FEAFA+: An Extended Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation | Wei Gan et.al. | 2111.02751 | null |
2021-11-02 | BiosecurID: a multimodal biometric database | Julian Fierrez et.al. | 2111.03472 | null |
2021-10-30 | Imitating Arbitrary Talking Style for Realistic Audio-DrivenTalking Face Synthesis | Haozhe Wu et.al. | 2111.00203 | link |
2021-10-26 | Emotion recognition in talking-face videos using persistent entropy and neural networks | Eduardo Paluzo-Hidalgo et.al. | 2110.13571 | link |
2021-10-26 | ViDA-MAN: Visual Dialog with Digital Humans | Tong Shen et.al. | 2110.13384 | null |
2021-10-22 | Invertible Frowns: Video-to-Video Facial Emotion Translation | Ian Magnusson et.al. | 2109.08061 | null |
2021-10-19 | Talking Head Generation with Audio and Speech Related Facial Action Units | Sen Chen et.al. | 2110.09951 | null |
2021-10-16 | Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor | Anchit Gupta et.al. | 2110.08580 | null |
2021-10-12 | Fine-grained Identity Preserving Landmark Synthesis for Face Reenactment | Haichao Zhang et.al. | 2110.04708 | null |
2021-10-07 | Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution | Yangyang Shi et.al. | 2110.05241 | null |
2021-09-24 | Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation | Yuanxun Lu et.al. | 2109.10595 | null |
2021-09-20 | Accurate, Interpretable, and Fast Animation: An Iterative, Sparse, and Nonconvex Approach | Stevo Rackovic et.al. | 2109.08356 | null |
2021-09-17 | Detection of GAN-synthesized street videos | Omran Alamayreh et.al. | 2109.04991 | null |
2021-08-30 | Audiovisual Speech Synthesis using Tacotron2 | Ahmed Hussen Abdelaziz et.al. | 2008.00620 | null |
2021-08-23 | KoDF: A Large-scale Korean DeepFake Detection Dataset | Patrick Kwon et.al. | 2103.10094 | null |
2021-08-23 | HeadGAN: One-shot Neural Head Synthesis and Editing | Michail Christos Doukas et.al. | 2012.08261 | null |
2021-08-19 | AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis | Yudong Guo et.al. | 2103.11078 | link |
2021-08-18 | DeepFake MNIST+: A DeepFake Facial Animation Dataset | Jiajun Huang et.al. | 2108.07949 | link |
2021-08-18 | FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning | Chenxu Zhang et.al. | 2108.07938 | link |
2021-08-12 | UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing | Meng Cao et.al. | 2108.05650 | null |
2021-08-11 | AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person | Xinsheng Wang et.al. | 2108.04325 | null |
2021-08-06 | SofGAN: A Portrait Image Generator with Dynamic Styling | Anpei Chen et.al. | 2007.03780 | link |
2021-07-27 | Beyond Voice Identity Conversion: Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations | Laurent Benaroya et.al. | 2107.12346 | null |
2021-07-21 | Speech Driven Talking Face Generation from a Single Image and an Emotion Condition | Sefik Emre Eskimez et.al. | 2008.03592 | link |
2021-07-20 | Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion | Suzhen Wang et.al. | 2107.09293 | link |
2021-07-10 | Speech2Video: Cross-Modal Distillation for Speech to Video Generation | Shijing Si et.al. | 2107.04806 | null |
2021-07-07 | Egocentric Videoconferencing | Mohamed Elgharib et.al. | 2107.03109 | null |
2021-06-08 | LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization | Avisek Lahiri et.al. | 2106.04185 | null |
2021-05-20 | Audio-Driven Emotional Video Portraits | Xinya Ji et.al. | 2104.07452 | null |
2021-05-07 | Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation | Lincheng Li et.al. | 2104.07995 | link |
2021-05-05 | A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors | Ruobing Zheng et.al. | 2002.08700 | null |
2021-04-29 | Learned Spatial Representations for Few-shot Talking-Head Synthesis | Moustafa Meshry et.al. | 2104.14557 | null |
2021-04-26 | One-shot Face Reenactment Using Appearance Adaptive Normalization | Guangming Yao et.al. | 2102.03984 | null |
2021-04-25 | 3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head | Qianyun Wang et.al. | 2104.12051 | null |
2021-04-22 | Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation | Hang Zhou et.al. | 2104.11116 | link |
2021-04-07 | Single Source One Shot Reenactment using Weighted motion From Paired Feature Points | Soumya Tripathy et.al. | 2104.03117 | null |
2021-04-07 | Everything's Talkin': Pareidolia Face Reenactment | Linsen Song et.al. | 2104.03061 | link |
2021-04-07 | LI-Net: Large-Pose Identity-Preserving Face Reenactment Network | Jin Liu et.al. | 2104.02850 | null |
2021-04-02 | One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing | Ting-Chun Wang et.al. | 2011.15126 | null |
2021-03-20 | Not made for each other- Audio-Visual Dissonance-based Deepfake Detection and Localization | Komal Chugh et.al. | 2005.14405 | link |
2021-03-19 | End-to-End Lip Synchronisation Based on Pattern Classification | You Jin Kim et.al. | 2005.08606 | null |
2021-03-05 | Real-time RGBD-based Extended Body Pose Estimation | Renat Bashirov et.al. | 2103.03663 | link |
2021-03-03 | Estimating Uniqueness of I-Vector Representation of Human Voice | Erkam Sinan Tandogan et.al. | 2008.11985 | null |
2021-02-25 | MakeItTalk: Speaker-Aware Talking-Head Animation | Yang Zhou et.al. | 2004.12992 | null |
2021-02-19 | One Shot Audio to Animated Video Generation | Neeraj Kumar et.al. | 2102.09737 | null |
2021-02-18 | AudioVisual Speech Synthesis: A brief literature review | Efthymios Georgiou et.al. | 2103.03927 | null |
2020-12-14 | Robust One Shot Audio to Video Generation | Neeraj Kumar et.al. | 2012.07842 | null |
2020-12-14 | Multi Modal Adaptive Normalization for Audio to Video Generation | Neeraj Kumar et.al. | 2012.07304 | null |
2020-11-30 | Adaptive Compact Attention For Few-shot Video-to-video Translation | Risheng Huang et.al. | 2011.14695 | null |
2020-11-21 | Stochastic Talking Face Generation Using Latent Distribution Matching | Ravindra Yadav et.al. | 2011.10727 | link |
2020-11-21 | Iterative Text-based Editing of Talking-heads Using Neural Retargeting | Xinwei Yao et.al. | 2011.10688 | null |
2020-11-09 | FACEGAN: Facial Attribute Controllable rEenactment GAN | Soumya Tripathy et.al. | 2011.04439 | null |
2020-11-06 | Large-scale multilingual audio visual dubbing | Yi Yang et.al. | 2011.03530 | null |
2020-11-02 | Facial Keypoint Sequence Generation from Audio | Prateek Manocha et.al. | 2011.01114 | null |
2020-10-25 | APB2FaceV2: Real-Time Audio-Guided Multi-Face Reenactment | Jiangning Zhang et.al. | 2010.13017 | link |
2020-10-12 | Intuitive Facial Animation Editing Based On A Generative RNN Framework | EloΓ―se Berson et.al. | 2010.05655 | null |
2020-10-05 | SMILE: Semantically-guided Multi-attribute Image and Layout Editing | AndrΓ©s Romero et.al. | 2010.02315 | link |
2020-10-05 | Dynamic Facial Asset and Rig Generation from a Single Scan | Jiaman Li et.al. | 2010.00560 | null |
2020-09-20 | An Improved Approach of Intention Discovery with Machine Learning for POMDP-based Dialogue Management | Ruturaj Raval et.al. | 2009.09354 | null |
2020-09-18 | Mesh Guided One-shot Face Reenactment using Graph Convolutional Networks | Guangming Yao et.al. | 2008.07783 | null |
2020-09-12 | DualLip: A System for Joint Lip Reading and Generation | Weicong Chen et.al. | 2009.05784 | null |
2020-09-02 | Seeing wake words: Audio-visual Keyword Spotting | Liliane Momeni et.al. | 2009.01225 | null |
2020-08-29 | "It took me almost 30 minutes to practice this". Performance and Production Practices in Dance Challenge Videos on TikTok | Daniel Klug et.al. | 2008.13040 | null |
2020-08-23 | A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild | K R Prajwal et.al. | 2008.10010 | link |
2020-08-11 | Audio- and Gaze-driven Facial Animation of Codec Avatars | Alexander Richard et.al. | 2008.05023 | null |
2020-08-04 | Speaker dependent acoustic-to-articulatory inversion using real-time MRI of the vocal tract | TamΓ‘s GΓ‘bor CsapΓ³ et.al. | 2008.02098 | link |
2020-08-04 | Real-Time Cleaning and Refinement of Facial Animation Signals | EloΓ―se Berson et.al. | 2008.01332 | null |
2020-08-02 | Deep Multi-modality Soft-decoding of Very Low Bit-rate Face Videos | Yanhui Guo et.al. | 2008.01652 | null |
2020-07-29 | Neural Voice Puppetry: Audio-driven Facial Reenactment | Justus Thies et.al. | 1912.05566 | link |
2020-07-20 | Deformable Style Transfer | Sunnie S. Y. Kim et.al. | 2003.11038 | link |
2020-07-18 | A Robust Interactive Facial Animation Editing System | EloΓ―se Berson et.al. | 2007.09367 | null |
2020-07-16 | Talking-head Generation with Rhythmic Head Motion | Lele Chen et.al. | 2007.08547 | link |
2020-07-08 | Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision | Abhinav Shukla et.al. | 2007.04134 | null |
2020-06-20 | Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams | Huirong Huang et.al. | 2006.11610 | null |
2020-05-27 | Modality Dropout for Improved Performance-driven Talking Faces | Ahmed Hussen Abdelaziz et.al. | 2005.13616 | null |
2020-05-25 | Identity-Preserving Realistic Talking Face Generation | Sanjana Sinha et.al. | 2005.12318 | null |
2020-05-22 | Head2Head: Video-based Neural Head Synthesis | Mohammad Rami Koujan et.al. | 2005.10954 | null |
2020-05-16 | FReeNet: Multi-Identity Face Reenactment | Jiangning Zhang et.al. | 1905.11805 | null |
2020-05-13 | FaR-GAN for One-Shot Face Reenactment | Hanxiang Hao et.al. | 2005.06402 | null |
2020-05-13 | Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning | Hao Zhu et.al. | 1812.06589 | null |
2020-05-11 | Dancing to the Partisan Beat: A First Analysis of Political Communication on TikTok | Juan Carlos Medina Serrano et.al. | 2004.05478 | link |
2020-05-07 | What comprises a good talking-head video generation?: A Survey and Benchmark | Lele Chen et.al. | 2005.03201 | link |
2020-05-04 | Disentangled Speech Embeddings using Cross-modal Self-supervision | Arsha Nagrani et.al. | 2002.08742 | null |
2020-04-30 | APB2Face: Audio-guided face reenactment with auxiliary pose and blink signals | Jiangning Zhang et.al. | 2004.14569 | null |
2020-03-30 | ActGAN: Flexible and Efficient One-shot Face Reenactment | Ivan Kosarevych et.al. | 2003.13840 | null |
2020-03-29 | Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose | Xianfang Zeng et.al. | 2003.12957 | null |
2020-03-26 | High-Accuracy Facial Depth Models derived from 3D Synthetic Data | Faisal Khan et.al. | 2003.06211 | null |
2020-03-05 | Talking-Heads Attention | Noam Shazeer et.al. | 2003.02436 | link |
2020-03-05 | Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose | Ran Yi et.al. | 2002.10137 | link |
2020-03-01 | Towards Automatic Face-to-Face Translation | Prajwal K R et.al. | 2003.00418 | link |
2020-02-19 | Speech-driven facial animation using polynomial fusion of features | Triantafyllos Kefalas et.al. | 1912.05833 | null |
2020-01-17 | ICface: Interpretable and Controllable Face Reenactment Using GANs | Soumya Tripathy et.al. | 1904.01909 | null |
2019-12-20 | Disentangling Style and Content in Anime Illustrations | Sitao Xiang et.al. | 1905.10742 | null |
2019-11-21 | FLNet: Landmark Driven Fetching and Learning Network for Faithful Talking Facial Animation Synthesis | Kuangxiao Gu et.al. | 1911.09224 | null |
2019-11-19 | MarioNETte: Few-shot Face Reenactment Preserving Identity of Unseen Targets | Sungjoo Ha et.al. | 1911.08139 | null |
2019-10-28 | Few-shot Video-to-Video Synthesis | Ting-Chun Wang et.al. | 1910.12713 | null |
2019-10-19 | Real-Time Lip Sync for Live 2D Animation | Deepali Aneja et.al. | 1910.08685 | link |
2019-10-16 | Designing Style Matching Conversational Agents | Deepali Aneja et.al. | 1910.07514 | null |
2019-10-15 | A High-Fidelity Open Embodied Avatar with Lip Syncing and Expression Capabilities | Deepali Aneja et.al. | 1909.08766 | link |
2019-10-09 | EmoCo: Visual Analysis of Emotion Coherence in Presentation Videos | Haipeng Zeng et.al. | 1907.12918 | null |
2019-10-02 | Animating Face using Disentangled Audio Representations | Gaurav Mittal et.al. | 1910.00726 | null |
2019-09-25 | Few-Shot Adversarial Learning of Realistic Neural Talking Head Models | Egor Zakharov et.al. | 1905.08233 | null |
2019-09-06 | Neural Style-Preserving Visual Dubbing | Hyeongwoo Kim et.al. | 1909.02518 | null |
2019-08-29 | 3D Face Pose and Animation Tracking via Eigen-Decomposition based Bayesian Approach | Ngoc-Trung Tran et.al. | 1908.11039 | null |
2019-08-20 | Prosodic Phrase Alignment for Machine Dubbing | Alp Γktem et.al. | 1908.07226 | link |
2019-08-16 | FSGAN: Subject Agnostic Face Swapping and Reenactment | Yuval Nirkin et.al. | 1908.05932 | link |
2019-08-11 | Emotion Dependent Facial Animation from Affective Speech | Rizwan Sadiq et.al. | 1908.03904 | null |
2019-08-05 | One-shot Face Reenactment | Yunxuan Zhang et.al. | 1908.03251 | link |
2019-07-25 | Talking Face Generation by Conditional Recurrent Adversarial Network | Yang Song et.al. | 1804.04786 | link |
2019-07-24 | Data-Driven Physical Face Inversion | Yeara Kozlov et.al. | 1907.10402 | null |
2019-07-23 | A system for efficient 3D printed stop-motion face animation | Rinat Abdrashitov et.al. | 1907.10163 | null |
2019-06-14 | Realistic Speech-Driven Facial Animation with GANs | Konstantinos Vougioukas et.al. | 1906.06337 | null |
2019-06-04 | Text-based Editing of Talking-head Video | Ohad Fried et.al. | 1906.01524 | null |
2019-05-27 | Audio2Face: Generating Speech/Face Animation from Single Audio with Attention-Based Bidirectional LSTM Networks | Guanzhong Tian et.al. | 1905.11142 | null |
2019-05-09 | Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss | Lele Chen et.al. | 1905.03820 | link |
2019-05-08 | Capture, Learning, and Synthesis of 3D Speaking Styles | Daniel Cudeiro et.al. | 1905.03079 | link |
2019-04-23 | Talking Face Generation by Adversarially Disentangled Audio-Visual Representation | Hang Zhou et.al. | 1807.07860 | null |
2019-04-02 | FEAFA: A Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation | Yanfu Yan et.al. | 1904.01509 | null |
2019-03-13 | Animating an Autonomous 3D Talking Avatar | Dominik Borer et.al. | 1903.05448 | null |
2018-12-22 | Deep Audio-Visual Speech Recognition | Triantafyllos Afouras et.al. | 1809.02108 | null |
2018-12-20 | DeepFakes: a New Threat to Face Recognition? Assessment and Detection | Pavel Korshunov et.al. | 1812.08685 | null |
2018-11-22 | Towards Highly Accurate and Stable Face Alignment for High-Resolution Videos | Ying Tai et.al. | 1811.00342 | link |
2018-11-16 | Influence of visual cues on head and eye movements during listening tasks in multi-talker audiovisual environments with animated characters | Maartje M. E. Hendrikse et.al. | 1812.02088 | null |
2018-08-28 | GANimation: Anatomically-aware Facial Animation from a Single Image | Albert Pumarola et.al. | 1807.09251 | link |
2018-08-19 | Dynamic Temporal Alignment of Speech to Lips | Tavi Halperin et.al. | 1808.06250 | link |
2018-07-29 | ReenactGAN: Learning to Reenact Faces via Boundary Transfer | Wayne Wu et.al. | 1807.11079 | link |
2018-07-26 | Learnable PINs: Cross-Modal Embeddings for Person Identity | Arsha Nagrani et.al. | 1805.00833 | null |
2018-07-19 | End-to-End Speech-Driven Facial Animation with Temporal GANs | Konstantinos Vougioukas et.al. | 1805.09313 | null |
2018-05-29 | Deep Video Portraits | Hyeongwoo Kim et.al. | 1805.11714 | null |
2018-05-24 | VisemeNet: Audio-Driven Animator-Centric Speech Animation | Yang Zhou et.al. | 1805.09488 | null |
2018-05-21 | Anime Style Space Exploration Using Metric Learning and Generative Adversarial Networks | Sitao Xiang et.al. | 1805.07997 | null |
2018-04-23 | Generating Talking Face Landmarks from Speech | Sefik Emre Eskimez et.al. | 1803.09803 | null |
2018-03-28 | Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network | Hai X. Pham et.al. | 1803.07716 | null |
2018-03-20 | Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks | Seyed Ali Jalalifar et.al. | 1803.07461 | null |
2017-12-07 | End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech | Hai X. Pham et.al. | 1710.00920 | null |
2017-12-06 | ObamaNet: Photo-realistic lip-sync from text | Rithesh Kumar et.al. | 1801.01442 | null |
2017-07-30 | Kernel Projection of Latent Structures Regression for Facial Animation Retargeting | Christos Ouzounis et.al. | 1707.09629 | null |
2017-07-26 | Fast Deep Matting for Portrait Animation on Mobile Phone | Bingke Zhu et.al. | 1707.08289 | null |
2017-07-21 | Multichannel Attention Network for Analyzing Visual Behavior in Public Speaking | Rahul Sharma et.al. | 1707.06830 | null |
2017-07-18 | You said that? | Joon Son Chung et.al. | 1705.02966 | null |
2017-01-30 | Lip Reading Sentences in the Wild | Joon Son Chung et.al. | 1611.05358 | link |
2016-10-28 | Galaxy gas as obscurer: II. Separating the galaxy-scale and nuclear obscurers of Active Galactic Nuclei | Johannes Buchner et.al. | 1610.09380 | link |
2016-07-11 | Large-Scale MIMO is Capable of Eliminating Power-Thirsty Channel Coding for Wireless Transmission of HEVC/H.265 Video | Shaoshi Yang et.al. | 1601.06684 | null |
2016-05-22 | Improving Facial Analysis and Performance Driven Animation through Disentangling Identity and Expression | David Rim et.al. | 1512.08212 | null |
2016-02-08 | Automatic Face Reenactment | Pablo Garrido et.al. | 1602.02651 | null |
2015-11-20 | ExpressionBot: An Emotive Lifelike Robotic Face for Face-to-Face Communication | Ali Mollahosseini et.al. | 1511.06502 | null |
2014-09-03 | Visual Speech Recognition | Ahmad B. A. Hassanat et.al. | 1409.1411 | null |
2012-09-22 | Using multimodal speech production data to evaluate articulatory animation for audiovisual speech synthesis | Ingmar Steiner et.al. | 1209.4982 | null |
2012-03-30 | Face Expression Recognition and Analysis: The State of the Art | Vinay Bettadapura et.al. | 1203.6722 | null |
2012-01-19 | Progress in animation of an EMA-controlled tongue model for acoustic-visual speech synthesis | Ingmar Steiner et.al. | 1201.4080 | null |
2010-03-01 | Re-verification of a Lip Synchronization Protocol using Robust Reachability | Piotr Kordy et.al. | 1003.0431 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-12-20 | MotiF: Making Text Count in Image Animation with Motion Focal Loss | Shijie Wang et.al. | 2412.16153 | null |
2024-12-13 | DisPose: Disentangling Pose Guidance for Controllable Human Image Animation | Hongxiang Li et.al. | 2412.09349 | link |
2024-12-11 | Animate-X: Universal Character Image Animation with Enhanced Motion Representation | Shuai Tan et.al. | 2410.10306 | null |
2024-12-05 | Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks | Jiahao Cui et.al. | 2412.00733 | link |
2024-12-04 | FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait | Taekyung Ki et.al. | 2412.01064 | null |
2024-11-30 | DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses | Yatian Pang et.al. | 2412.00397 | null |
2024-11-28 | JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation | Xuyang Cao et.al. | 2411.09209 | link |
2024-11-27 | StableAnimator: High-Quality Identity-Preserving Human Image Animation | Shuyuan Tu et.al. | 2411.17697 | link |
2024-11-24 | LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis | Haojie Zhang et.al. | 2411.16748 | null |
2024-11-21 | HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation | Zhenzhi Wang et.al. | 2407.17438 | link |
2024-10-31 | TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation | Sunjae Yoon et.al. | 2410.24037 | null |
2024-10-20 | FrameBridge: Improving Image-to-Video Generation with Bridge Models | Yuji Wang et.al. | 2410.15371 | null |
2024-10-14 | Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation | Jiahao Cui et.al. | 2410.07718 | link |
2024-09-30 | Illustrious: an Open Advanced Illustration Model | Sang Hyun Park et.al. | 2409.19946 | null |
2024-09-29 | High Quality Human Image Animation using Regional Supervision and Motion Blur Condition | Zhongcong Xu et.al. | 2409.19580 | null |
2024-09-22 | Dormant: Defending against Pose-driven Human Image Animation | Jiachen Zhou et.al. | 2409.14424 | link |
2024-07-23 | Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models | Xin Ma et.al. | 2407.15642 | link |
2024-07-12 | TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models | Jeongho Kim et.al. | 2407.09012 | null |
2024-07-12 | EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions | Zhiyuan Chen et.al. | 2407.08136 | null |
2024-07-11 | MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model | Muyao Niu et.al. | 2405.20222 | link |
2024-06-16 | Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation | Mingwang Xu et.al. | 2406.08801 | null |
2024-06-13 | Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control | Jingyun Xue et.al. | 2406.03035 | null |
2024-06-03 | UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation | Xiang Wang et.al. | 2406.01188 | null |
2024-06-01 | Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance | Shenhao Zhu et.al. | 2403.14781 | link |
2024-05-29 | Evaluating the efectiveness of sonifcation in science education using Edukoi | Lucrezia Guiotto Nai Fovino et.al. | 2405.18908 | null |
2024-05-28 | VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation | Qilin Wang et.al. | 2405.18156 | null |
2024-05-28 | Controllable Longer Image Animation with Diffusion Models | Qiang Wang et.al. | 2405.17306 | null |
2024-03-25 | PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models | Yiming Zhang et.al. | 2312.13964 | link |
2024-03-13 | Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts | Yue Ma et.al. | 2403.08268 | link |
2024-03-08 | Audio-Synchronized Visual Animation | Lin Zhang et.al. | 2403.05659 | null |
2024-03-05 | Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation | Weijie Li et.al. | 2403.02827 | null |
2024-01-17 | Continuous Piecewise-Affine Based Motion Model for Image Animation | Hexiang Wang et.al. | 2401.09146 | link |
2024-01-03 | Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions | David Junhao Zhang et.al. | 2401.01827 | link |
2023-12-06 | AnimateZero: Video Diffusion Models are Zero-Shot Image Animators | Jiwen Yu et.al. | 2312.03793 | link |
2023-12-05 | LivePhoto: Real Image Animation with Text-guided Motion Control | Xi Chen et.al. | 2312.02928 | null |
2023-12-04 | AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance | Zuozhuo Dai et.al. | 2311.12886 | link |
2023-11-30 | Motion-Conditioned Image Animation for Video Editing | Wilson Yan et.al. | 2311.18827 | null |
2023-11-27 | MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model | Zhongcong Xu et.al. | 2311.16498 | null |
2023-11-27 | DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors | Jinbo Xing et.al. | 2310.12190 | link |
2023-11-19 | Differential Motion Evolution for Fine-Grained Motion Deformation in Unsupervised Image Animation | Peirong Liu et.al. | 2110.04658 | null |
2023-10-16 | LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation | Ruiqi Wu et.al. | 2310.10769 | link |
2023-10-11 | LEO: Generative Latent Image Animator for Human Video Synthesis | Yaohui Wang et.al. | 2305.03989 | link |
2023-09-26 | Text-Guided Synthesis of Eulerian Cinemagraphs | Aniruddha Mahapatra et.al. | 2307.03190 | link |
2023-09-25 | Automatic Animation of Hair Blowing in Still Portrait Photos | Wenpeng Xiao et.al. | 2309.14207 | null |
2023-07-10 | AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning | Yuwei Guo et.al. | 2307.04725 | link |
2023-07-09 | Predictive Coding For Animation-Based Video Compression | Goluck Konuko et.al. | 2307.04187 | null |
2023-04-12 | VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs | Moayed Haji Ali et.al. | 2304.06020 | null |
2023-03-10 | 3D Cinemagraphy from a Single Image | Xingyi Li et.al. | 2303.05724 | null |
2023-02-02 | Dreamix: Video Diffusion Models are General Video Editors | Eyal Molad et.al. | 2302.01329 | null |
2023-01-14 | Continuous odor profile monitoring to study olfactory navigation in small animals | Kevin S. Chen et.al. | 2301.05905 | null |
2022-11-30 | NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-shot Real Image Animation | Yu Yin et.al. | 2211.17235 | null |
2022-10-04 | Implicit Warping for Animation with Image Sets | Arun Mallya et.al. | 2210.01794 | null |
2022-09-28 | Motion Transformer for Unsupervised Image Animation | Jiale Tao et.al. | 2209.14024 | link |
2022-07-19 | Single Stage Virtual Try-on via Deformable Attention Flows | Shuai Bai et.al. | 2207.09161 | link |
2022-07-08 | Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation | Yucheng Suo et.al. | 2207.03714 | null |
2022-06-11 | Bayesian Statistics Guided Label Refurbishment Mechanism: Mitigating Label Noise in Medical Image Classification | Mengdi Gao et.al. | 2106.12284 | link |
2022-04-05 | Neural Fields in Visual Computing and Beyond | Yiheng Xie et.al. | 2111.11426 | null |
2022-03-29 | Thin-Plate Spline Motion Model for Image Animation | Jian Zhao et.al. | 2203.14367 | link |
2022-03-29 | Image Animation with Perturbed Masks | Yoav Shalev et.al. | 2011.06922 | link |
2022-03-25 | 3D GAN Inversion for Controllable Portrait Image Animation | Connor Z. Lin et.al. | 2203.13441 | null |
2022-03-17 | Latent Image Animator: Learning to Animate Images via Latent Space Navigation | Yaohui Wang et.al. | 2203.09043 | null |
2021-12-21 | Image Animation with Keypoint Mask | Or Toledano et.al. | 2112.10457 | link |
2021-12-19 | Move As You Like: Image Animation in E-Commerce Scenario | Borun Xu et.al. | 2112.13647 | null |
2021-12-17 | AI-Empowered Persuasive Video Generation: A Survey | Chang Liu et.al. | 2112.09401 | null |
2021-10-26 | Incremental Learning for Animal Pose Estimation using RBF k-DPP | Gaurav Kumar Nayak et.al. | 2110.13598 | null |
2021-09-03 | Sparse to Dense Motion Transfer for Face Image Animation | Ruiqi Zhao et.al. | 2109.00471 | null |
2021-08-18 | DeepFake MNIST+: A DeepFake Facial Animation Dataset | Jiajun Huang et.al. | 2108.07949 | link |
2021-06-23 | Analisis Kualitas Layanan Website E-Commerce Bukalapak Terhadap Kepuasan Pengguna Mahasiswa Universitas Bina Darma Menggunakan Metode Webqual 4.0 | Adellia et.al. | 2106.15342 | null |
2021-04-07 | Single Source One Shot Reenactment using Weighted motion From Paired Feature Points | Soumya Tripathy et.al. | 2104.03117 | null |
2021-03-22 | PriorityCut: Occlusion-guided Regularization for Warp-based Image Animation | Wai Ting Cheung et.al. | 2103.11600 | null |
2020-12-01 | Ultra-low bitrate video conferencing using deep image animation | Goluck Konuko et.al. | 2012.00346 | null |
2020-10-01 | First Order Motion Model for Image Animation | Aliaksandr Siarohin et.al. | 2003.00196 | link |
2020-08-27 | Deep Spatial Transformation for Pose-Guided Person Image Generation and Animation | Yurui Ren et.al. | 2008.12606 | link |
2019-08-30 | Animating Arbitrary Objects via Deep Motion Transfer | Aliaksandr Siarohin et.al. | 1812.08861 | link |
2018-10-09 | 3D model silhouette-based tracking in depth images for puppet suit dynamic video-mapping | Guillaume Caron et.al. | 1810.03956 | null |
2018-06-24 | A Design of FPGA Based Small Animal PET Real Time Digital Signal Processing and Correction Logic | Jiaming Lu et.al. | 1806.09117 | null |
2018-01-31 | RAPTOR I: Time-dependent radiative transfer in arbitrary spacetimes | Thomas Bronzwaer et.al. | 1801.10452 | null |
2016-06-23 | Gender and Interest Targeting for Sponsored Post Advertising at Tumblr | Mihajlo Grbovic et.al. | 1606.07189 | null |
2015-03-16 | Use of Effective Audio in E-learning Courseware | Kisor Ray et.al. | 1503.04837 | null |
2015-02-04 | Multimedia-Video for Learning | Kah Hean Chua et.al. | 1502.01090 | null |
2013-01-25 | Measurements of Martian Dust Devil Winds with HiRISE | David S. Choi et.al. | 1301.6130 | null |
2010-01-04 | Tutoring System for Dance Learning | Rajkumar Kannan et.al. | 1001.0440 | null |
Notes:
-
We have modified the
sorting rule
of the above table to prioritize papers based on the time of their latest update rather than their initial publication date. If an article has been recently modified, it will appear earlier in the list. -
However, recent trends are still based on
ten
papers sorted by the initial publication date.
Function added: