incorrect audio shape

Open sergedahdouh opened this issue 2 years ago • 1 comments

data_svc/waves-16k/ data_svc/whisper

speaker0<<<<<<<<<< (639, 1024) speaker1<<<<<<<<<< Traceback (most recent call last): File "/content/lora-svc/prepare/preprocess_ppg.py", line 54, in pred_ppg(whisper, f"{wavPath}/{spks}/{file}.wav", f"{ppgPath}/{spks}/{file}.ppg") File "/content/lora-svc/prepare/preprocess_ppg.py", line 26, in pred_ppg ppg = whisper.encoder(mel.unsqueeze(0)).squeeze().data.cpu().float().numpy() File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/content/lora-svc/whisper/model.py", line 154, in forward assert len_x <= len_e, "incorrect audio shape" AssertionError: incorrect audio shape

any idea what is the issue speaker0 is my record voice around 11 sec and speaker1 is song which is around 57 sec

Jun 28 '23 12:06 sergedahdouh

4 cut audio, less than 30 seconds for whisper

Jun 28 '23 13:06 MaxMax2016