leekum2018
leekum2018
在模型代码model.py 381行中,损失函数使用了KL散度,但是在论文中,损失函数是交叉熵。请问是我理解错了吗?
In generate.py, the bos_token_id=1 and eos_token_id=2, ` model.config.bos_token_id = 1` ` model.config.eos_token_id = 2` However, in finetune.py, the tokenizer is directly loaded from the official llama checkpoint, where bos_token_id=0 and...
代码如下: from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-2b", trust_remote_code=True) model = AutoModelForSeq2SeqLM.from_pretrained("THUDM/glm-2b", trust_remote_code=True) model = model.half().cuda() model.eval() inputs = tokenizer("Ng is an adjunct professor at [MASK] (formerly associate professor...
### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction Thank you for your brilliant work! I would like to know how to select...
As said in the second paragraph of Section 4.3, "We attribute the superior performance of DiffusionBERT to its onetime sampling of all tokens". I wonder the meaning of "onetime sampling...