xiaotingxuan
xiaotingxuan
I am running the experiments of QQP and I have changed the computation of loss in the training code. when I create dataset, I add 'loss_mask' ```python loss_mask = ([0]*(len(src)+1)...
when I use the original loss(without loss mask),I get the following result ``` ----------------------------- | decoder_nll | 1.27e-05 | | decoder_nll_q0 | 1.68e-05 | | decoder_nll_q1 | 1.55e-05 | |...
> > when I use the original loss(without loss mask),I get the following result > > ``` > > ----------------------------- > > | decoder_nll | 1.27e-05 | > > |...
Hi , I am also confused about that. When I visualize the trained embedding and bert embedding,I find its very different. here is trained embedding,it looks like a gaussian distribution(I...
According to your code,the randomly initialized the embeddings is passed as x_start, then it is assigned to a local variable x_start_fix, but you don't use x_start_fix in the later code....