DiffTalk icon indicating copy to clipboard operation
DiffTalk copied to clipboard

[CVPR2023] The implementation for "DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation"

Results 37 DiffTalk issues
Sort by recently updated
recently updated
newest added

https://github.com/yxdydgithub/difftalk_preprocess 实测可用

Thanks for your great work. I am confused one thing in preporcessing stage. When we extract images, landmarks and audio features from a video, do we need to have the...

I use the deepspeech==0.9.3, however, it has error: graph_def.ParseFromString(f.read()) google.protobuf.message.DecodeError: Error parsing message with type 'tensorflow.GraphDef'

As we all know, the driven-audio feature a and the landmark representation l are just a vector, not a batch of vectors, so how can they be used in cross-attention...

After preprocessing of HDTF dataset, I got 415 videos. 249 videos (60%) were randomly selected as training set, the others (40%) were test set. The first 1500 frames of each...

![error](https://github.com/sstzal/DiffTalk/assets/78424820/a04daa28-66a1-4779-bb89-ac351d3cfd9e) i encountered a problem about package 'ldm', my env's ldm==0.1.3 python==3.7 pytorch==1.12.1

how to test with my own ref_image and audio ,to generate audio-driven video

What does every line in data_test.txt mean?I guess first part before '_'means the id of video,the later one means the frame number of that video.But some of them don't have...

Can anyone share a useable requirements.txt? It has many conflicts and error.