Jize Cao issues

Results 5 issues of


                                            Jize Cao

RuntimeError: Function 'MmBackward' returned nan values in its 0th output.

I use `model, optimizer = amp.initialize(model, optimizer, opt_level='O1' )` and my loss backwards is handled by ` with amp.scale_loss(loss, optimizer) as scaled_loss: scaled_loss.backward() ` Things work well without amp.initialize, so...

WC task bert accuracy is much lower than paper claimed

I have tried use the SentEval WC dataset to evaluate Bert performance. But the result is much lower than the paper claimed. (0.4 compares to 24.9 in paper for layer...

The baseline implementation may not be the 'hierarchical network' described in the paper

The paper mentioned that the baseline is built by the "hierarchical network" with a linear regression after taking the input. But, I only found that the baseline implementation is just...

Would you like to share all the checkpoints of the model during the training phase?

I am interested in how the model is learning during its training phase. Would you like to share not only the final checkpoint, but also the checkpoints among the training...

The single-stream model pre-trained checkpoint

Just as a comparison, would you like to release the pretrained single-stream bert?