g2p-seq2seq Check size for embedding layer

When you convert letter and phoneme symbols to numerical ids, isn't it confusing for the model to train with integers for classes? Would it be better to have one-hot encoding or maybe even letter embeddings to make distances between letters or phonemes more meaningful?

Sep 27 '16 13:09 dreamk73

Hi Esther

Thanks for your comment. Actually we use standard tensorflow seq2seq model described here:

https://www.tensorflow.org/versions/r0.10/tutorials/seq2seq/index.html

They will be embedded into a dense representation (see the Vectors Representations Tutorial for more details on embeddings), but to construct these embeddings we need to specify the maximum number of discrete symbols that will appear: num_encoder_symbols on the encoder side, and num_decoder_symbols on the decoder side.

You see it uses embedding, encoding input symbols into dense space. So it sort of builds the mapping you describe automatically. The question is embedding size, I now see we use embedding size 512 same as in layers, it is probably not a good value for phonemes.

So we need to experiment a bit more about it.

Sep 27 '16 15:09 nshmyrev

Thanks, I definitely need to delve into the code more to see what it is doing. I understand now that it embeds the numbers into embedding vectors before training.

I can't quite reproduce the results you give on the main page. When I use:

g2p.py --train cmudict_nostress.dict --size=64 --numlayers=2 --model data

where I downloaded the latest cmudict.dict file and removed the stress from the phoneme labels, I get these results:

WER : 0.448681872038 Accuracy : 0.551318127962

Which is worse than the 0.3961 WER you report for the latest CMU dict.

Do you have any idea why it deteriorated from an earlier version of the dictionary?

Sep 28 '16 10:09 dreamk73

Maybe some issues with input file preparation. Could you share your cmudict_nostress.dict ? And provide your tensorflow version.

Sep 28 '16 10:09 nshmyrev

I just downloaded cmudict.dict from the link provided and removed the 0/1/2 labels to get rid of the stress. See attached file.

I am using tensorflow 0.10.0

Even with using 512 nodes, WER only goes down a little to 0.4235.

cmudict_nostress.txt

Sep 28 '16 12:09 dreamk73

I've just trained with your file and clean checkout

Words: 13502
Errors: 5158
WER: 0.382
Accuracy: 0.618

Seems ok. What is the size of your model (mine is about 500kb). What is the final perplexity you see in the log, what is the number of steps and final perplexity? Mine are

global step 227400 learning rate 0.0481 step-time 0.09 perplexity 1.04

Sep 29 '16 08:09 nshmyrev

With additional cleanup (one have to remove () in alternative pronunciations) and without words with digits (there are few I just removed from cmusphinx git) I get the following:

global step 89200 learning rate 0.1849 step-time 0.09 perplexity 1.13
Training done.
Creating 2 layers of 64 units.
Reading model parameters from data
Beginning calculation word error rate (WER) on test sample.
Words: 12594
Errors: 4197
WER: 0.333
Accuracy: 0.667

Sep 29 '16 12:09 nshmyrev

It must have had something to do with the version of the code. I downloaded the latest clean version as you did yesterday and ran it again on my cmudict_nostress. Now it says: Words: 13502 Errors: 5059 WER: 0.375 Accuracy: 0.625

Sep 30 '16 14:09 dreamk73

Ok, thats about it. You need additional cleanup as I described to get to 0.33. Let me know if you are able to reproduce 44. Thanks

Sep 30 '16 14:09 nshmyrev

And, results between runs are not perfectly reproducible since tf uses random initialization

Sep 30 '16 14:09 nshmyrev

Still a valid issue. We need to evaluate the best value for embedding layer dimension.

Feb 01 '17 15:02 nshmyrev