listwebit
listwebit
源码下载执行 sh training_scripts/single_node/run_LoRA.sh 报错如下: len(train_dataloader) = 334 len(train_dataset) = 1000 args.per_device_train_batch_size = 1 len(eval_dataloader) = 334 len(eval_dataset) = 1000 args.per_device_eval_batch_size = 1 [2023-04-23 11:34:49,179] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info:...
查看了bloom-7B是基于FP16的参数,模型大小十几个G。 为什么belle-7B模型大小来到了二十多个G,是从FP16转移到了FP32吗?
我们模型用的BLoom-2M的,用的docker的环境,用的bash training_scripts/single_node/run_LoRA.sh output-lora 2; 也换成3试了,也跑不起来。但是用以前版本的fineture用lora就可以跑起来,这是为啥是不是现在lora还不完善呢 出现下面的错误: [2023-04-25 10:52:32,890] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 47.61 GB, percent = 18.9% Traceback (most recent call last): File "main.py", line 402, in...
### Reminder - [X] I have searched the Github Discussion and issues and have not found anything similar to this. ### Motivation 1.现在好像官方代码不支持增量预训练,怎么才能做增量预训练呢? 2.能不能再sft脚本上输如数据改一下,直接用领域无监督数据输如进去做呢? 或者稍微修改进行数据偏移一下呢? 3.请大佬指点一下如果具体做呢 ### Solution 谢谢 ###...
### Reminder - [X] I have searched the Github Discussion and issues and have not found anything similar to this. ### Motivation 是不是需要和论文一致呢,但是好像论文没有放出来呀 ### Solution 能给些论文的超参吗 ### Alternatives 能给些论文的超参吗 ###...
### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction 我是先经过预训练,然后微调,最后推理,训练的时候都是全参数更新,脚本如下: pretrain: ``` deepspeed --hostfile=./hostfile --master_port=9901 src/train_bash.py \ --deepspeed ./ds_config.json \ --stage pt \...
### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction 有几个问题请教哈: 1 段落之间需要加\n吗 2.如果模型预处理最长能处理4096个token,那么没有样本的长度是不是尽量在4096以内,且稍微小于4096呢 3.一本书处理成多个样本后需不需要shuf打散呢 4.特殊符号,\t ,需要去掉吗 5.有没有想过的资料介绍呢 ### Expected behavior _No response_ ###...
I used the following installation method, but received an error that has not been resolved for several days: git clone https://github.com/NVIDIA/apex cd apex pip install --global-option="--cpp_ext" --global-option="--cuda_ext" --no-cache -v --disable-pip-version-check...
### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction ``` CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \ --stage rm \ --do_train \ --model_name_or_path path_to_llama_model \ --adapter_name_or_path...