bestbzw issues

Results 7 issues of


                                            bestbzw

xxlarge模型微调时训不动

同样的代码，用bert，robert都可以训练，但是用albert_xxlarge时，loss却不下降。请问是要在训练的时候设置什么超参数吗？我加载模型时用的是AutoModel.from_pretrained, 加载tokenizer的时候用的BertTokenizer.from_pretrained.

how to get a binarizer Tree

Hi , I need a binarizer parsing Tree. how do i use the corenlp serve to get it?

关于论文中的FLOPs计算

请问论文中 BERT baseline的FLOPs为什么是21785M？按照表一列的内容，BERT的FLOPs不应该是1809.9 * 12 + 46.1 = 21765M吗？

About the AR model

Hi I re-build a AR model with tensorflow, but the loss not decrease. i can't find the difference between your model and mine. could you public the training log? Thank...

[BUG]: timed out when using 64 GPUs.

### 🐛 Describe the bug I am experimenting gemini, the code runs fine when using only 16 GPUs or less on a single machine. But if I use 64 GPUs,...

bug

the input of decoder is an one-hot vector

hi Why the input of the decoder is an one-hot vector? We usually use dense vectors(through embedding lookup function) as the input in seq2seq models.

请问训练albert_xxlarge时超参如何设置

您好，我在ReCo上用xx_large训练的时候发现模型的loss一直不下降，请问您的超参数是如何设置的？是否加了warmup，dropout等策略？