Congcong Wang

Results 2 issues of Congcong Wang

Hi, In drmm model, Should it be **x = torch.einsum('bl,bl->b', torch.flip(dense_output,(-1,)), attention_probs)** Instead of x = torch.einsum('bl,bl->b', dense_output, attention_probs) After I revise this, I got the training loss reduction much...

bug

The latest commit solved the following bug: ```bash Instructions for updating: renamed to `run` 0%| | 0/16 [00:37