DeepLearningForNLPInPytorch
DeepLearningForNLPInPytorch copied to clipboard
NGram and CBOW implementation
I have a question regarding your implementation of NGram (and, regardingly, CBOW which I have adapted). According to the code presented you're creating a two-layer perceptron, and the second linear layer is outputting a tensor of dimensions (out_dim, vocab_size), which is okay as long as you're not trying to train the embeddings on a real corpus with vocab size of, say, 200 thousand tokens which clogs CUDA RAM for good. I cannot see how this makes sense to train the embeddings separately for each batch of several texts. Could you be so kind to explain?