xiaotingxuan comments

Results 5 comments of


                                            xiaotingxuan

Taken <Pad> as a regular token could make model only learn the <Pad> information?

I am running the experiments of QQP and I have changed the computation of loss in the training code. when I create dataset, I add 'loss_mask' ```python loss_mask = ([0]*(len(src)+1)...

Taken <Pad> as a regular token could make model only learn the <Pad> information?

when I use the original loss（without loss mask），I get the following result ``` ----------------------------- | decoder_nll | 1.27e-05 | | decoder_nll_q0 | 1.68e-05 | | decoder_nll_q1 | 1.55e-05 | |...

Taken <Pad> as a regular token could make model only learn the <Pad> information?

> > when I use the original loss（without loss mask），I get the following result > > ``` > > ----------------------------- > > | decoder_nll | 1.27e-05 | > > |...

Why is trained embedding orthogonal?

Hi , I am also confused about that. When I visualize the trained embedding and bert embedding，I find its very different. here is trained embedding，it looks like a gaussian distribution（I...

Randomly Initialized embeddings?

According to your code，the randomly initialized the embeddings is passed as x_start, then it is assigned to a local variable x_start_fix, but you don't use x_start_fix in the later code....