fairseq
fairseq copied to clipboard
Textless NLP / GSLM: Speech resynthesis produces incorrect results
What is your question?
When I resynthesize audio using HuBERT, TTS and waveglow checkpoint I am not getting correct results. it seems like my generated audio is truncated.
Code
import torch import soundfile as sf
speech1_path = 'original.flac'
speech1, sr = librosa.load(speech1_path, sr=None, mono=True) speech1_tensor = torch.tensor(speech1) audio = speech1_tensor.unsqueeze(0).cuda() a, b, c = gslm.encode(audio)
m, audi = gslm.decode(c, True) audio = audi[0] new_audio = audio.squeeze().cpu().numpy()
new_audio = new_audio.astype('float32') sf.write('output.wav', new_audio, 22050)
Link of the generated and original audio : Link
What's your environment?
- fairseq Version (e.g., 1.0 or main): v0.12.2
- PyTorch Version (e.g., 1.0): 1.10.1+cu102
- OS (e.g., Linux): Ubuntu 18.04
- How you installed fairseq (
pip, source): https://github.com/facebookresearch/fairseq - Build command you used (if compiling from source):
- Python version: 3.8.17
- CUDA/cuDNN version: 12.2
- GPU models and configuration:
- Any other relevant information: