BoringDoggie
BoringDoggie
HDFS dataset 是有label的,这个数据集标注了哪些blocks是outliers
直接把文件名从.crx改成.zip拖到chrome://extensions/就行了,不用再用crx3打包了
> 作者你好: 基于同一份数据,沿用qwen1 repo中的finetune_lora_ds.sh的脚本,分别lora finetune了qwen1-chat-14B和qwen1.5-chat-14B; 1.训练参数也是对齐的(都是11%,lm_head和wte也参与训练了); 2.数据中有部分modelscope-agent-7b的agent训练数据(api触发); > > ``` > 这块做了三个实验: > 1.直接测试开源的qwen1.5-chat-14B和qwen-chat-14B,效果qwen1.5要更好; > 2.model_max_length设置为2048, 采用上述训练,qwen1.5与qwen1差不多,qwen1效果要稍微好一些; > 3.model_max_length设置为4096, 只训练了qwen1.5,qwen1.5比2048版本的qwen1.5要差很多; > ``` > > 请问下第三个实验为啥qwen1.5会退化的很严重,我的训练数据里token长度分布从50-4096的都有,而且分布不是很均匀,是因为这个原因吗?还是qwen1.5的训练不能用qwen1的脚本呢?qwen1.5模型是从https://huggingface.co/Qwen/Qwen1.5-14B-Chat 中获取; We are encountering the...
you need to install flash_attn. otherwise you should comment out monkey script used in train_mem.py
also in apply lora we should add option `trust_remote_code=True` to every` from_pretrained`
> Sorry my bad. I didn't correctly apply the change for `falcon_generate_stream`. I've fixed it and now it's working. Great work again @ericzhou571 ! > > One thing may need...
> Besides @infwinston's comments, can you separate training and inference into two separate PRs? We want to merge the inference code as soon as possible and review the training in...
> Besides @infwinston's comments, can you separate training and inference into two separate PRs? We want to merge the inference code as soon as possible and review the training in...
you should use `import fastchat`
You can utilize the Falcon special token, denoted as >>SUFFIX