TC-ResNet icon indicating copy to clipboard operation
TC-ResNet copied to clipboard

Regarding Google Speech Command dataset

Open miautoml opened this issue 5 years ago • 3 comments

Hi, thanks for the great code. I've tried to reproduce the results. However, I found two confusing issues,

  1. The original dataset seems not to include _silence_ folder, I didn't see _silence_/721f767c_nohash_2.wav listed in test.txt

  2. The dataset has 64k samples, while in the code only 29k used. Why is it so? Are the results from your paper produced from the data included in the following listed files?

$ cat test.txt  | wc -l
3081
$ cat train.txt | wc -l
22246
$ cat valid.txt | wc -l
3093

Looking forward to your reply,

Sincerely, Bo

miautoml avatar Mar 09 '20 11:03 miautoml

Hi Bo,

  1. Since _silence is literally empty audio, we don't have to have an exact wav file. When you check data loader, it automatically generates empty audio on-the-fly. https://github.com/hyperconnect/TC-ResNet/blob/8ccbff3a45590247d8c54cc82129acb90eecf5c8/datasets/audio_data_wrapper.py#L146-L174

  2. Would you double-check our instruction at https://github.com/hyperconnect/TC-ResNet/tree/master/speech_commands_dataset? You can find Google's original preprocess codes at here but as we mentioned above, we slightly modify split function. Even if there are 30 keywords in the original dataset, we select 10 keywords as previous studies did, which is mentioned in the paper. That the reason why the number of selected wav samples is different.

justin-hpcnt avatar Mar 13 '20 02:03 justin-hpcnt

@miautoml Have u solved your problem? I have meet a same question, and learn how to solve it

Lebhoryi avatar May 23 '20 09:05 Lebhoryi

@justin-hpcnt Could I ask a question that how to create three '*.txt' files? There is any ideas about how to get files in https://github.com/hyperconnect/TC-ResNet/tree/master/speech_commands_dataset or papers.

  1. no '_silence' file
  2. using '__hash__' to split to three files , I have got those(same wanted words), and they are sorted:
(base) ..T-Thread/WakeUp-Xiaorui/data❯ cat training_list.txt | wc -l 
30769
(base) ..T-Thread/WakeUp-Xiaorui/data❯ cat validation_list.txt | wc -l 
3703
(base) ..T-Thread/WakeUp-Xiaorui/data❯ cat testing_list.txt | wc -l   
4074

thanks a lot.I would read 'speech_commands_dataset/readme' again.

Lebhoryi avatar May 23 '20 09:05 Lebhoryi