zxj329
zxj329
firstly thanks ur code。but i have a question,when u extract mel spec,u use sqrt(energy)not energy in meldataset.py
首先感谢博主分享了自己的开源代码,其次我有2个问题需要咨询下,因为您这个代码是适用于8k,我想改成16k,是不是修改相关的每一块的chunk_size K=sqrt(2乘16000乘4)约等于360。之前是8k对应250。第二点:想咨询下我用这个代码来做降噪发现loss都维持在17-18左右,想问下您用DPRNN做人声分离前几代的loss是大概多少的样子。期待您的回复。
RVQ loss
firstly,thanks to ur code,and then i have a quentison,when i use RVQ,it will return 8 loss,how do u solve this problem,u add all loss to one?
hello,actor: firstly, thanks to ur code。 and i want to ask u a question about the function "filter_dc_notch16", i dont know what is used for.loonking forward to ur replay
如果你训练的时候需要生成denoise_training的可执行文件,不是只要运行./compile.sh就行呀。。。
#### What is your question? #### Code from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks from argparse import ArgumentParser import os import soundfile as sf import time import numpy as...
hello,你好,首先非常感谢你的开源脚本,我用你的ios脚本编译出来了静态库onnxruntime.a。但是我在其他工程中链接这个库的时候,发生了以下错误。这是什么原因造成的?我现在onnxruntime的版本是1.19.2,看了好久的资料都没头绪。以下是一部分错误。期待你的回复。 ld64.lld: error: undefined symbol: std::__1::basic_ostream::operator
convert.py: with torch.no_grad(): for line in tqdm(zip(titles, srcs, tgts)): title, src, tgt = line # tgt wav_tgt, _ = librosa.load(tgt, sr=hps.data.sampling_rate) wav_tgt, _ = librosa.effects.trim(wav_tgt, top_db=20) 对音频会进行静音切除,然后再生成mel谱。但是训练的时候貌似没有做这个操作,是否会导致不匹配呢
max value is tensor(1.4666, device='cuda:3', grad_fn=) max value is tensor(1.0702, device='cuda:6', grad_fn=) min value is tensor(-1.0317, device='cuda:0', grad_fn=) max value is tensor(1.2846, device='cuda:0', grad_fn=) min value is tensor(-1.0303, device='cuda:4', grad_fn=)...