fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

Textless NLP / GSLM: Speech resynthesis produces incorrect results

Open KaushalNaresh opened this issue 2 years ago • 0 comments

What is your question?

When I resynthesize audio using HuBERT, TTS and waveglow checkpoint I am not getting correct results. it seems like my generated audio is truncated.

Code

import torch import soundfile as sf

speech1_path = 'original.flac'

speech1, sr = librosa.load(speech1_path, sr=None, mono=True) speech1_tensor = torch.tensor(speech1) audio = speech1_tensor.unsqueeze(0).cuda() a, b, c = gslm.encode(audio)

m, audi = gslm.decode(c, True) audio = audi[0] new_audio = audio.squeeze().cpu().numpy()

new_audio = new_audio.astype('float32') sf.write('output.wav', new_audio, 22050)

Link of the generated and original audio : Link

What's your environment?

  • fairseq Version (e.g., 1.0 or main): v0.12.2
  • PyTorch Version (e.g., 1.0): 1.10.1+cu102
  • OS (e.g., Linux): Ubuntu 18.04
  • How you installed fairseq (pip, source): https://github.com/facebookresearch/fairseq
  • Build command you used (if compiling from source):
  • Python version: 3.8.17
  • CUDA/cuDNN version: 12.2
  • GPU models and configuration:
  • Any other relevant information:

gslm.txt

KaushalNaresh avatar Oct 02 '23 06:10 KaushalNaresh