高琪 comments

Results 5 comments of


                                            高琪

TypeError: load() missing 1 required positional argument: 'Loader'

replace yaml.load with yaml.safe_load

RuntimeError: CUDA error: device-side assert triggered when running Llama on multiple gpus

my situation is that when starting training 0.97 epoch,this problem occurs .I guess maybe the data problem result in.But it still occours when I try to use half data.Interestingly,When I...

[Help] generate方法和chat方法的调用结果不一致

> 看了一下chat方法实现的源码，它调用了generate方法，有以下更改： > > 1. 对input_ids进行了修改，例如输入“你好”，直接tokenize得到的input_ids是['gMASK', 'sop', '▁你', '好']，chat中调用方法build_chat_input将input_ids变成了['[gMASK]', 'sop', '', '▁', '', '▁你', '好', ''] > 2. 更改了参数logits_processer和eos_token_id > > 仅仅更改这些参数就会导致差异较大，也是不太理解，有知道的小伙伴吗？因为chat版本的模型是根据这个格式进行对话微调的，所以chat方式要进行模板添加，如果要使用generate需要自己加载chat模板

关于prompt生成

> 我们在训练的时候把instruction+input合并为模型输入了，所以爬取脚本中的input就是对应我们的模型输入。感谢回复，我知道这一点，在用chatgpt生成原始数据时，只有input一个内容但是你们提供的json,经过清洗的数据包含instruction和input两部分，请问是以什么样的标准分割的呢？

decoding_mode top_k_top_p does not take effect for llama2 not same with huggingface

有可能是因为，top_k = 100 中的词语中第一个单词的概率非常大，导致sample时，有很大的概率被选中。你可以试一下beam_width这个参数，应该是beam search的个数