NGram and CBOW implementation

Open kithogue opened this issue 3 years ago • 0 comments

I have a question regarding your implementation of NGram (and, regardingly, CBOW which I have adapted). According to the code presented you're creating a two-layer perceptron, and the second linear layer is outputting a tensor of dimensions (out_dim, vocab_size), which is okay as long as you're not trying to train the embeddings on a real corpus with vocab size of, say, 200 thousand tokens which clogs CUDA RAM for good. I cannot see how this makes sense to train the embeddings separately for each batch of several texts. Could you be so kind to explain?

Jan 16 '23 13:01 kithogue