syncnet_trainer
syncnet_trainer copied to clipboard
Disentangled Speech Embeddings using Cross-Modal Self-Supervision
Hi, I'm a little confused about the meaning of "offset" in the txt file. Could anyone please explain the meaning of it ? Thank you.
https://www.robots.ox.ac.uk/~vgg/software/lipsync/data/voxsrc2020_baseline.model
Hi, I'd like to know how can I add disentangle loss into the training process so as to know the true value of disentangling. It seems adding disentangle loss into...
Hi, I am looking through this repo and I am confused about the choice of loss function used. I am using SyncNet to measure lip-sync error and considering that this...
Hi, joonson 1. run python makeFileList.py, It always Skipped audio and video lengths different, I get wav from mp4 by ffmpeg, how do you get wav from m4a or from...
It is 00048.txt instead. Is there anything wrong with the dataset?
I am trying the repo for the first time. While preparing the data I find that we need the text annotations of the voxceleb files. But I find the [dataset](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox2.html)...
Where are the negative audio samples being generated for M-way matching problem? I just see load_wav function samples the audio corresponding to the starting index in video frame. I only...
Hello, I have a couple of questions regarding the 75.8% synchronization accuracy reported in https://ieeexplore.ieee.org/abstract/document/9067055/ Perfect match Evaluation protocol: The task is to determine the correct synchronisation within a ±15...