Generate training data code

Open GabrielHaoHao opened this issue 1 year ago • 0 comments

Hello! Thank you very much for your work! I am currently facing some issues.“Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting” this paper mentioned that the training set involves approximately 8000k phrases. However, when I use your testing set splitting strategy to divide the data into 100h and 360h hours, I can only obtain a dataset size close to that of the testing set. This is obviously incorrect, and I really hope to know what the training set splitting strategy is. Looking forward to your reply.

Apr 07 '24 03:04 GabrielHaoHao