fengyue comments

Results 13 comments of


                                            fengyue

indexSelectLargeIndex: block: [378,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

怎么解决，我也遇到了

Traceback (most recent call last): from fastchat.conversation import (compute_skip_echo_len, ImportError: cannot import name 'compute_skip_echo_len' from 'fastchat.conversation' (/home/a/miniconda3/envs/rag/lib/python3.9/site-packages/fastchat/conversation.py)

已经没这个函数了，fastchat这个库conversation.py已经没有compute_skip_echo_len和get_default_conv_template咋搞？？

Some questions about reproduction?

数据集的格式是什么，也没有给出

[Bug]: 对lora merge后的模型量化，量化后模型输出一直出现human：

> 我用1.5B + 2k多样本，用llama factory进行lora微调，做匹配任务，效果反而比不微调的差。🤔 你用了多少训练样本？loss到多少？ loss，0.05，我的问题是微调后量化，出现离谱输出，本身微调后效果是很好的

``` output_dir='lora/oneke-bio-8-add' mkdir -p ${output_dir} CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc_per_node=8 --master_port=1287 src/finetune.py \ --do_train --do_eval \ --overwrite_output_dir \ --model_name_or_path '/data/shensw/model/OneKE' \ --stage 'sft' \ --model_name 'llama' \ --template 'llama2_zh' \ --train_file '/data/shensw/DeepKE/example/llm/InstructKGC/data/NER/bio8-data/bio_train.json'...

请问可以多卡推理吗，单卡显存有限

--bits 8 改为 --bits 4后可以运行了，看了代码，小于8，不同bits：4/8是选择不同的量化方式，都是bitesandbytes，难道跟版本有关系？还有运行时-bits 4为什么速度慢--bits 162倍多，疑惑不解，求大佬解惑

请问可以多卡推理吗，单卡显存有限

运行时-bits 4为什么速度慢--bits 16 两倍多呢，bits 16一条2s，bits 4一条反而要5s。正常量化不应该变快些吗，疑惑不解，求大佬解惑