SoundStream Is the code runnable without changing parameters?

Hi, first of all thank you for sharing the code!

Having been working on this code for a while, I am wondering how to run the code - is the code runnable without any modification?

For me, I first put the trimmed training file(mono, 10 seconds, 44100, wav) into a folder, changed the path in the script, but it seemed that there are some dimension errors and the whole training procedure cannot continue. I looked into the code and added permute in some parts of the code to make the errors disappear (e.g. permute(0,2,1) right before the quantizer). After the modifications, the code finally run, but the quantizer is producing constants, so the result sound horrible.

Therefore, I am wondering if I made anything wrong, in particular, is editing the code necessary, did I encounter the problems because of some fault in my parameters?

Thank you!

Apr 14 '22 08:04 asdf2adsfad

Hi, When I pushed the branch as is, the training would run for me. It is not optimized nor good enough code for me yet, but it ran. From what you say, it seems like you want to use a different sampling rate. This might imply deep changes to the code and the architecture of the pipeline.

Apr 14 '22 09:04 wesbz

I change the same dataset（The NSynth Dataset），but it still got errors. As a freshman, I can't solve:(

Apr 14 '22 09:04 ghost

@wesbz

Hi, When I pushed the branch as is, the training would run for me. It is not optimized nor good enough code for me yet, but it ran. From what you say, it seems like you want to use a different sampling rate. This might imply deep changes to the code and the architecture of the pipeline.

Thanks for the valuable insight! It really answered some question I had for a long time.

As you implied changes in sr, would you please provide some details on the training process? What kind of audio file did you use (i.e. mono, sr, format), and if I clone the code from the repositories from stratch, what is the right way to train the first epoch?

Thank you very much!

Apr 14 '22 23:04 asdf2adsfad

I also tried the NSynth dataset, and unfortunately the forward process stopped before the quantizier. The reason why I thought permute is necessary is because: In line 156 of the net.py (latest commit from the soundstream main branch)

def forward(self, x):
    # x:[batch,1,sample_length]
    # e:[batch,D,encoded_length]
    e = self.encoder(x)
    quantized, _, _ = self.quantizer(e)

    # From the readme from your commit for the VQ module: https://github.com/lucidrains/vector-quantize-pytorch/tree/d4f06653eabdd4e528f75a55b10804e60b38dc30
    # the residual_vq accepts [batch,seq,dim]
    # therefore, is it necessary to perform e.permute((0,2,1)) ?
    o = self.decoder(quantized)
    return o

Is there anything I overlooked? Thank you!

Apr 15 '22 02:04 asdf2adsfad

I also tried the NSynth dataset, and unfortunately the forward process stopped before the quantizier. The reason why I thought permute is necessary is because: In line 156 of the net.py (latest commit from the soundstream main branch)
def forward(self, x):
    # x:[batch,1,sample_length]
    # e:[batch,D,encoded_length]
    e = self.encoder(x)
    quantized, _, _ = self.quantizer(e)

    # From the readme from your commit for the VQ module: https://github.com/lucidrains/vector-quantize-pytorch/tree/d4f06653eabdd4e528f75a55b10804e60b38dc30
    # the residual_vq accepts [batch,seq,dim]
    # therefore, is it necessary to perform e.permute((0,2,1)) ?
    o = self.decoder(quantized)
    return o
Is there anything I overlooked? Thank you!

I Guess 'premute(0,2,1)' is needed both before and after quantizer. because the output of encoder and input of decoder is (batch, dim, seq)

Jun 10 '22 05:06 wl3b10s

How to run this code? I am getting dimension errors . Can you tell me where to add these permute statements?

Jul 07 '22 05:07 SamarpreetSingh

Can confirm @wl3b10s is correct. need permute both before and after. I ran the code (make sure to have vector-quantize-pytorch<=0.10.4 installed, the recent changes are still buggy

Nov 09 '22 04:11 jeffersonHsieh

hi, I tried the code on Nsynth dataset available in kaggle, i get this scary error, any help please

0%| | 0/6120 [00:01<?, ?it/s]

RuntimeError Traceback (most recent call last) /tmp/ipykernel_28/292411318.py in 451 lengths_x = lengths_x.to(device) 452 --> 453 G_x = soundstream(x) 454 455 s_x = torch.stft(x.squeeze(), n_fft=1024, hop_length=256, window=torch.hann_window(window_length=1024, device=device), return_complex=False).permute(0, 3, 1, 2)

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1189 or _global_forward_hooks or _global_forward_pre_hooks): -> 1190 return forward_call(*input, **kwargs) 1191 # Do not call functions when jit is used 1192 full_backward_hooks, non_full_backward_hooks = [], []

/tmp/ipykernel_28/292411318.py in forward(self, x) 163 def forward(self, x): 164 enc = self.encoder(x) --> 165 quantized, _, _ = self.quantizer(enc) 166 dec = self.decoder(quantized) 167 return dec

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1189 or _global_forward_hooks or _global_forward_pre_hooks): -> 1190 return forward_call(*input, **kwargs) 1191 # Do not call functions when jit is used 1192 full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/vector_quantize_pytorch/residual_vq.py in forward(self, x, return_all_codes) 133 continue 134 --> 135 quantized, indices, loss = layer(residual) 136 residual = residual - quantized.detach() 137 quantized_out = quantized_out + quantized

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1189 or _global_forward_hooks or _global_forward_pre_hooks): -> 1190 return forward_call(*input, **kwargs) 1191 # Do not call functions when jit is used 1192 full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py in forward(self, x, mask) 566 x = rearrange(x, f'b n (h d) -> {ein_rhs_eq}', h = heads) 567 --> 568 quantize, embed_ind = self._codebook(x) 569 570 if self.training:

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1189 or _global_forward_hooks or _global_forward_pre_hooks): -> 1190 return forward_call(*input, **kwargs) 1191 # Do not call functions when jit is used 1192 full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/torch/amp/autocast_mode.py in decorate_autocast(*args, **kwargs) 12 def decorate_autocast(*args, **kwargs): 13 with autocast_instance: ---> 14 return func(*args, **kwargs) 15 decorate_autocast.__script_unsupported = '@autocast() decorator is not supported in script mode' # type: ignore[attr-defined] 16 return decorate_autocast

/opt/conda/lib/python3.7/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py in forward(self, x) 271 flatten = rearrange(x, 'h ... d -> h (...) d') 272 --> 273 self.init_embed_(flatten) 274 275 embed = self.embed if not self.learnable_codebook else self.embed.detach()

/opt/conda/lib/python3.7/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py in init_embed_(self, data) 234 ) 235 --> 236 self.embed.data.copy_(embed) 237 self.embed_avg.data.copy_(embed.clone()) 238 self.cluster_size.data.copy_(cluster_size)

RuntimeError: output with shape [1, 1, 1] doesn't match the broadcast shape [1, 1, 198]

Apr 17 '23 02:04 MR-GREEN1337

hi, I tried the code on Nsynth dataset available in kaggle, i get this scary error, any help please

0%| | 0/6120 [00:01<?, ?it/s]

RuntimeError Traceback (most recent call last) /tmp/ipykernel_28/292411318.py in 451 lengths_x = lengths_x.to(device) 452 --> 453 G_x = soundstream(x) 454 455 s_x = torch.stft(x.squeeze(), n_fft=1024, hop_length=256, window=torch.hann_window(window_length=1024, device=device), return_complex=False).permute(0, 3, 1, 2)

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1189 or _global_forward_hooks or _global_forward_pre_hooks): -> 1190 return forward_call(*input, **kwargs) 1191 # Do not call functions when jit is used 1192 full_backward_hooks, non_full_backward_hooks = [], []

/tmp/ipykernel_28/292411318.py in forward(self, x) 163 def forward(self, x): 164 enc = self.encoder(x) --> 165 quantized, _, _ = self.quantizer(enc) 166 dec = self.decoder(quantized) 167 return dec

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1189 or _global_forward_hooks or _global_forward_pre_hooks): -> 1190 return forward_call(*input, **kwargs) 1191 # Do not call functions when jit is used 1192 full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/vector_quantize_pytorch/residual_vq.py in forward(self, x, return_all_codes) 133 continue 134 --> 135 quantized, indices, loss = layer(residual) 136 residual = residual - quantized.detach() 137 quantized_out = quantized_out + quantized

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1189 or _global_forward_hooks or _global_forward_pre_hooks): -> 1190 return forward_call(*input, **kwargs) 1191 # Do not call functions when jit is used 1192 full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py in forward(self, x, mask) 566 x = rearrange(x, f'b n (h d) -> {ein_rhs_eq}', h = heads) 567 --> 568 quantize, embed_ind = self._codebook(x) 569 570 if self.training:

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1189 or _global_forward_hooks or _global_forward_pre_hooks): -> 1190 return forward_call(*input, **kwargs) 1191 # Do not call functions when jit is used 1192 full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/torch/amp/autocast_mode.py in decorate_autocast(*args, **kwargs) 12 def decorate_autocast(*args, **kwargs): 13 with autocast_instance: ---> 14 return func(*args, **kwargs) 15 decorate_autocast.__script_unsupported = '@autocast() decorator is not supported in script mode' # type: ignore[attr-defined] 16 return decorate_autocast

/opt/conda/lib/python3.7/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py in forward(self, x) 271 flatten = rearrange(x, 'h ... d -> h (...) d') 272 --> 273 self.init_embed_(flatten) 274 275 embed = self.embed if not self.learnable_codebook else self.embed.detach()

/opt/conda/lib/python3.7/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py in init_embed_(self, data) 234 ) 235 --> 236 self.embed.data.copy_(embed) 237 self.embed_avg.data.copy_(embed.clone()) 238 self.cluster_size.data.copy_(cluster_size)

RuntimeError: output with shape [1, 1, 1] doesn't match the broadcast shape [1, 1, 198]

I got the same error, no solution yet?

Dec 04 '23 19:12 D23125151

Can confirm @wl3b10s is correct. need permute both before and after. I ran the code (make sure to have vector-quantize-pytorch<=0.10.4 installed, the recent changes are still buggy

Still the same dimension problem

Dec 04 '23 20:12 D23125151