Check size for embedding layer
When you convert letter and phoneme symbols to numerical ids, isn't it confusing for the model to train with integers for classes? Would it be better to have one-hot encoding or maybe even letter embeddings to make distances between letters or phonemes more meaningful?
Hi Esther
Thanks for your comment. Actually we use standard tensorflow seq2seq model described here:
https://www.tensorflow.org/versions/r0.10/tutorials/seq2seq/index.html
They will be embedded into a dense representation (see the Vectors Representations Tutorial for more details on embeddings), but to construct these embeddings we need to specify the maximum number of discrete symbols that will appear: num_encoder_symbols on the encoder side, and num_decoder_symbols on the decoder side.
You see it uses embedding, encoding input symbols into dense space. So it sort of builds the mapping you describe automatically. The question is embedding size, I now see we use embedding size 512 same as in layers, it is probably not a good value for phonemes.
So we need to experiment a bit more about it.
Thanks, I definitely need to delve into the code more to see what it is doing. I understand now that it embeds the numbers into embedding vectors before training.
I can't quite reproduce the results you give on the main page. When I use:
g2p.py --train cmudict_nostress.dict --size=64 --numlayers=2 --model data
where I downloaded the latest cmudict.dict file and removed the stress from the phoneme labels, I get these results:
WER : 0.448681872038 Accuracy : 0.551318127962
Which is worse than the 0.3961 WER you report for the latest CMU dict.
Do you have any idea why it deteriorated from an earlier version of the dictionary?
Maybe some issues with input file preparation. Could you share your cmudict_nostress.dict ? And provide your tensorflow version.
I just downloaded cmudict.dict from the link provided and removed the 0/1/2 labels to get rid of the stress. See attached file.
I am using tensorflow 0.10.0
Even with using 512 nodes, WER only goes down a little to 0.4235.
I've just trained with your file and clean checkout
Words: 13502
Errors: 5158
WER: 0.382
Accuracy: 0.618
Seems ok. What is the size of your model (mine is about 500kb). What is the final perplexity you see in the log, what is the number of steps and final perplexity? Mine are
global step 227400 learning rate 0.0481 step-time 0.09 perplexity 1.04
With additional cleanup (one have to remove (
global step 89200 learning rate 0.1849 step-time 0.09 perplexity 1.13
Training done.
Creating 2 layers of 64 units.
Reading model parameters from data
Beginning calculation word error rate (WER) on test sample.
Words: 12594
Errors: 4197
WER: 0.333
Accuracy: 0.667
It must have had something to do with the version of the code. I downloaded the latest clean version as you did yesterday and ran it again on my cmudict_nostress. Now it says: Words: 13502 Errors: 5059 WER: 0.375 Accuracy: 0.625
Ok, thats about it. You need additional cleanup as I described to get to 0.33. Let me know if you are able to reproduce 44. Thanks
And, results between runs are not perfectly reproducible since tf uses random initialization
Still a valid issue. We need to evaluate the best value for embedding layer dimension.