leekum2018 issues

Results 5 issues of


                                            leekum2018

the discrepancy for loss function between paper and script.

在模型代码model.py 381行中，损失函数使用了KL散度，但是在论文中，损失函数是交叉熵。请问是我理解错了吗？

What are the eos_token_id and bos_token_id

In generate.py, the bos_token_id=1 and eos_token_id=2, ` model.config.bos_token_id = 1` ` model.config.eos_token_id = 2` However, in finetune.py, the tokenizer is directly loaded from the official llama checkpoint, where bos_token_id=0 and...

使用glm-2b时候，跟随readme提供的例子，得到很糟糕的输出

代码如下: from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-2b", trust_remote_code=True) model = AutoModelForSeq2SeqLM.from_pretrained("THUDM/glm-2b", trust_remote_code=True) model = model.half().cuda() model.eval() inputs = tokenizer("Ng is an adjunct professor at [MASK] (formerly associate professor...

How to select Badam optimizer in the web interface?

### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction Thank you for your brilliant work! I would like to know how to select...

enhancement

pending

Inquiry on some details of the method.

As said in the second paragraph of Section 4.3, "We attribute the superior performance of DiffusionBERT to its onetime sampling of all tokens". I wonder the meaning of "onetime sampling...