leekum2018

Results 5 issues of leekum2018

在模型代码model.py 381行中,损失函数使用了KL散度,但是在论文中,损失函数是交叉熵。请问是我理解错了吗?

In generate.py, the bos_token_id=1 and eos_token_id=2, ` model.config.bos_token_id = 1` ` model.config.eos_token_id = 2` However, in finetune.py, the tokenizer is directly loaded from the official llama checkpoint, where bos_token_id=0 and...

代码如下: from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-2b", trust_remote_code=True) model = AutoModelForSeq2SeqLM.from_pretrained("THUDM/glm-2b", trust_remote_code=True) model = model.half().cuda() model.eval() inputs = tokenizer("Ng is an adjunct professor at [MASK] (formerly associate professor...

### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction Thank you for your brilliant work! I would like to know how to select...

enhancement
pending

As said in the second paragraph of Section 4.3, "We attribute the superior performance of DiffusionBERT to its onetime sampling of all tokens". I wonder the meaning of "onetime sampling...