LPCNet post-processing tech on the generated waves

LPCNet performs very well on the features from ground-truth waves. However, when the features are predicted using an end2end model or SPSS model, there are some audible artifacts in the generated waves.

Anyone knows If there are some post-processing tech on the synthesized waves to reduce the artifacts.

Some synthesized samples: e2e_lpcnet_samples_share.zip

Feb 26 '19 09:02 candlewill

For example, from Baidu TTS online demo, it can be found from the spectrogram the audio is enhanced apparently by a post-processing tech to reduce the noise.

Baidu-TTS-sample.zip

Feb 26 '19 09:02 candlewill

@candlewill What about dataset size of "e2e_lpcnet_samples_share.zip". It sounds well. My e2e_demo have some noise, and loss about 3.35 for default parameters.

Feb 27 '19 02:02 hyzhan

@candlewill Do you use tts with LPCNet ?

Mar 02 '19 07:03 OswaldoBornemann

yes

Mar 03 '19 08:03 candlewill

@candlewill Does LPCNet perform better than Griffin Lim on audio sound quality and audio generation speed ?

Mar 03 '19 10:03 OswaldoBornemann

@tsungruihon I think the LPCNet (train on your own dataset) is better than Griffin Lim both on quality and speed, especially when some post-processing algorithm could be applied.

Mar 03 '19 15:03 candlewill

@candlewill I think the synthesized sample you shared is very good actually. Probably just a little bit over fitting for LPCNet. By the way, I have the problem of training a stable LPCNet for TTS, I found LPCNet hard to converge, any suggestions ? Thanks