bestbzw

Results 7 issues of bestbzw

同样的代码,用bert,robert都可以训练,但是用albert_xxlarge时,loss却不下降。请问是要在训练的时候设置什么超参数吗?我加载模型时用的是AutoModel.from_pretrained, 加载tokenizer的时候用的BertTokenizer.from_pretrained.

Hi , I need a binarizer parsing Tree. how do i use the corenlp serve to get it?

请问论文中 BERT baseline的FLOPs为什么是21785M? 按照表一列的内容,BERT的FLOPs不应该是1809.9 * 12 + 46.1 = 21765M吗?

Hi I re-build a AR model with tensorflow, but the loss not decrease. i can't find the difference between your model and mine. could you public the training log? Thank...

### 🐛 Describe the bug I am experimenting gemini, the code runs fine when using only 16 GPUs or less on a single machine. But if I use 64 GPUs,...

bug

hi Why the input of the decoder is an one-hot vector? We usually use dense vectors(through embedding lookup function) as the input in seq2seq models.

您好,我在ReCo上用xx_large训练的时候发现模型的loss一直不下降,请问您的超参数是如何设置的?是否加了warmup,dropout等策略?