hotword detection for a new language
Hi
Is it possible that I train this model for a language with a different alphabet than English, such as Persian?
Thanks,
Yes, you can do that. Please go through the training file.
Thanks. I executed training.ipynb, but I faced with an error:
No file or directory found at /content/drive/MyDrive/Siamese/modelCheckpoints_old/model-8-01-0.96.h5
I think I need some pre-trained models, but I could not find it in your github. Is it possible that you upload them in the github space to be accessed by everyone?
My another question is that:
I found that there are lots of English single-word audio files in the directory: "dataset_format_fixed". Do I require a new single-word audio dataset to train for a new language? or Can I use the model trained by your English dataset to customize on my hot words that are with completely different alphabets and letters such as in Arabic:
آ ب ث د ر ز م س ش ح ض
Thanks in advance
For your first question: Training again with Arabic words will give a better performance instead of going with the pre-trained model of English since the window frame of audio will be different(guessing this since Arabic words are longer than 1 sec).
Do I require a new single-word audio dataset to train for a new language? Yes if you want to get high accuracy. Our model gives the best accuracy on words that have less than 1.5 sec.
Like. @aman-17 pointed out it can be better to train the model from scratch as there is very little to no similarities in the pronunciations between arabian language and english
Secondly a more polished version of the code with pytorch and resnet is currently under the works. Will share the same soon , so stay stuned!
The new model is out, can you test it with arabic languages and let us know? The newer model has only been trained for english words , but its perfomance is way better than the old one
Soon we will share the training code of the newer model as well
am trying to do this as well, so i am trying to create a wakeword in an african language and while reading through your paper, i came across the use of siamense networks. so is it applicable that i can upload audio data for nay language as long as it is in the range of 1.5 seconds and the model will map new input as close to what the embeddings i created with the few audio samples.
and also do you mind sharing link to the code for the new model
The repo currently already has the new model as the default one.. ideally it should be able to work with any wakewords out there too .. like you mentioned, the model precisely does that