Jize Cao

Results 5 issues of Jize Cao

I use `model, optimizer = amp.initialize(model, optimizer, opt_level='O1' )` and my loss backwards is handled by ` with amp.scale_loss(loss, optimizer) as scaled_loss: scaled_loss.backward() ` Things work well without amp.initialize, so...

I have tried use the SentEval WC dataset to evaluate Bert performance. But the result is much lower than the paper claimed. (0.4 compares to 24.9 in paper for layer...

The paper mentioned that the baseline is built by the "hierarchical network" with a linear regression after taking the input. But, I only found that the baseline implementation is just...

I am interested in how the model is learning during its training phase. Would you like to share not only the final checkpoint, but also the checkpoints among the training...

Just as a comparison, would you like to release the pretrained single-stream bert?