DiffuseStyleGesture icon indicating copy to clipboard operation
DiffuseStyleGesture copied to clipboard

A little question about wavlm and seed mask

Open RobinWitch opened this issue 2 years ago • 6 comments

Hello, thanks for sharing code ! In the DiffuseStyleGesture , the model only use one audio feature , wavlm . But when extract wavlm feature from raw wav , the code did't norm wav beform extract wavlm feature , while DiffuseStyleGesture++ do it .. . , I think DiffuseStyleGesture++ is right ,while DiffuseStyleGesture only use wavlm feature extracted from wav not norm but still generate good result in sample , it confuses me a lot .

In the DiffuseStyleGesture ++ . the paper "The DiffuseStyleGesture+ entry to the GENEA Challenge 2023" Figure 1 mentioned that seed gesture should use a mask while the code not .

RobinWitch avatar Nov 03 '23 07:11 RobinWitch

Yeah, you're right. The WavLM for the first question should preferably be normalized; for the second question this RM is not quite the same as in the paper, but it should have little impact on the results, as stylization as well as controllability are not considered in the challenge. I have updated in Readme. Thanks for the correction.

YoungSeng avatar Nov 04 '23 03:11 YoungSeng

Maybe I don't explain my first question well , I suppose that using not normalized wav to get WavLM is a fatal mistake ,why it works in the end, it's so strange.... While I didn't test use normalizd wav to get WavLM in DiffuseStyleGesture...

RobinWitch avatar Nov 04 '23 06:11 RobinWitch

I'm guessing that's just a feature extractor, and the bias is the same if none of them are normalized. This may only work on this particular speaker.

YoungSeng avatar Nov 05 '23 03:11 YoungSeng

Hum...still sounds strange :( May be we need to make a simple test.

RobinWitch avatar Nov 05 '23 07:11 RobinWitch

Hi, I think it norm wav beform extract wavlm feature, the code is cmd = ['ffmpeg-normalize', wav_file, '-o', normalize_wav_path, '-ar', '16000'] in here. But it is different from the code, I don't know if it works.

jiaqiAA avatar Nov 06 '23 06:11 jiaqiAA

Well ,I find that mean and std of the wav is not 0 and 1 .The commandffmpeg-normalize is not same role as layer_norm ,you can refer to it .

Hi, I think it norm wav beform extract wavlm feature, the code is cmd = ['ffmpeg-normalize', wav_file, '-o', normalize_wav_path, '-ar', '16000'] in here. But it is different from the code, I don't know if it works.

RobinWitch avatar Nov 06 '23 08:11 RobinWitch