DeepSpeech icon indicating copy to clipboard operation
DeepSpeech copied to clipboard

Create cocktail party data set

Open kdavis-mozilla opened this issue 7 years ago • 10 comments

Create cocktail party data set where the background noise for training comes only from the training data sets, background noise for validation comes only from the validation data sets, and , background noise for test comes only from the test data sets.

kdavis-mozilla avatar Sep 12 '18 12:09 kdavis-mozilla

Howdy, I was just linked to this: https://voices18.github.io/ Could it be used to help in this regard?

SalvorinFex avatar Sep 19 '18 22:09 SalvorinFex

@SalvorinFex Thanks!

Currently we're planning on using the CC0 audio from freesound along with using many random samples of the training set, at a lower in volume, to create a cocktail party data set. But, the VOiCES Corpus also might be a nice addition.

kdavis-mozilla avatar Sep 20 '18 07:09 kdavis-mozilla

Another Mozilla associated noise reduction project created a large noise dataset that might be useful to you. It’s downloadable at the bottom of this page: https://people.xiph.org/~jm/demo/rnnoise/

zaptrem avatar May 02 '19 07:05 zaptrem

Are there plans to run reverb filters on the datasets as well? STT might struggle in this area.

zaptrem avatar Jul 20 '20 20:07 zaptrem

https://deepspeech.readthedocs.io/en/v0.7.4/TRAINING.html#augmentation

tilmankamp avatar Jul 21 '20 11:07 tilmankamp

@tilmankamp Thanks! Are the pretrained models trained with reverb and these other augmentations enabled? Also, is the reverb added before or after the audio is mixed with the noise samples?

zaptrem avatar Jul 25 '20 06:07 zaptrem

Models trained on data with such augmentations are in the process of being trained.

kdavis-mozilla avatar Jul 27 '20 08:07 kdavis-mozilla

@kdavis-mozilla Are the just-released 0.8 models trained with these augmentations? Or are those coming in 0.9/1.0?

zaptrem avatar Jul 30 '20 19:07 zaptrem

@zaptrem They are coming later.

kdavis-mozilla avatar Jul 31 '20 08:07 kdavis-mozilla

@kdavis-mozilla Would this noise+dataset be useful to this? https://iqtlabs.github.io/voices/

zaptrem avatar Aug 26 '20 03:08 zaptrem