DiffTalk
DiffTalk copied to clipboard
[CVPR2023] The implementation for "DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation"
https://github.com/yxdydgithub/difftalk_preprocess 实测可用
Thanks for your great work. I am confused one thing in preporcessing stage. When we extract images, landmarks and audio features from a video, do we need to have the...
I use the deepspeech==0.9.3, however, it has error: graph_def.ParseFromString(f.read()) google.protobuf.message.DecodeError: Error parsing message with type 'tensorflow.GraphDef'
As we all know, the driven-audio feature a and the landmark representation l are just a vector, not a batch of vectors, so how can they be used in cross-attention...
After preprocessing of HDTF dataset, I got 415 videos. 249 videos (60%) were randomly selected as training set, the others (40%) were test set. The first 1500 frames of each...
 i encountered a problem about package 'ldm', my env's ldm==0.1.3 python==3.7 pytorch==1.12.1
how to test with my own ref_image and audio ,to generate audio-driven video
What does every line in data_test.txt mean?I guess first part before '_'means the id of video,the later one means the frame number of that video.But some of them don't have...
Can anyone share a useable requirements.txt? It has many conflicts and error.