listwebit issues

Results 18 issues of


                                            listwebit

谁能帮忙解决呀，都是用的官方的docker;File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/zero/stage3.py", line 133, in init self.dtype = self.optimizer.param_groups[0]['params'][0].dtype

哪个大佬救救孩子吧，这个问题好几天了，都没有解决

源码下载执行 sh training_scripts/single_node/run_LoRA.sh 报错如下： len(train_dataloader) = 334 len(train_dataset) = 1000 args.per_device_train_batch_size = 1 len(eval_dataloader) = 334 len(eval_dataset) = 1000 args.per_device_eval_batch_size = 1 [2023-04-23 11:34:49,179] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info:...

查看了bloom-7B是基于FP16的参数，模型大小十几个G。为什么belle-7B模型大小来到了二十多个G，是从FP16转移到了FP32吗？

在docker环境下，run_LoRa有问题，3张32G的V100也跑不起来，用之前的finetune就可以跑起来

我们模型用的BLoom-2M的，用的docker的环境，用的bash training_scripts/single_node/run_LoRA.sh output-lora 2；也换成3试了，也跑不起来。但是用以前版本的fineture用lora就可以跑起来，这是为啥是不是现在lora还不完善呢出现下面的错误： [2023-04-25 10:52:32,890] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 47.61 GB, percent = 18.9% Traceback (most recent call last): File "main.py", line 402, in...

大佬好，请教一下，如果做增量预训练呢？

### Reminder - [X] I have searched the Github Discussion and issues and have not found anything similar to this. ### Motivation 1.现在好像官方代码不支持增量预训练，怎么才能做增量预训练呢？ 2.能不能再sft脚本上输如数据改一下，直接用领域无监督数据输如进去做呢？或者稍微修改进行数据偏移一下呢？ 3.请大佬指点一下如果具体做呢 ### Solution 谢谢 ###...

领域增量预训练超参怎么设置效果才能变好呢

### Reminder - [X] I have searched the Github Discussion and issues and have not found anything similar to this. ### Motivation 是不是需要和论文一致呢，但是好像论文没有放出来呀 ### Solution 能给些论文的超参吗 ### Alternatives 能给些论文的超参吗 ###...

doc-not-needed

请作者重视，我多次实验发现，如果回答的内容比较长，就回出现截断情况。

### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction 我是先经过预训练，然后微调，最后推理，训练的时候都是全参数更新，脚本如下： pretrain: ``` deepspeed --hostfile=./hostfile --master_port=9901 src/train_bash.py \ --deepspeed ./ds_config.json \ --stage pt \...

solved

预训练阶段，对书籍的处理有什么要求吗？比如，段落之间需要加\n吗，每条样本必须保持段落的完整吗，还是有截断也行呢？

### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction 有几个问题请教哈： 1 段落之间需要加\n吗 2.如果模型预处理最长能处理4096个token，那么没有样本的长度是不是尽量在4096以内，且稍微小于4096呢 3.一本书处理成多个样本后需不需要shuf打散呢 4.特殊符号，\t ，需要去掉吗 5.有没有想过的资料介绍呢 ### Expected behavior _No response_ ###...

Help me, I'm dying soon，error: command '/opt/rh/devtoolset-7/root/usr/bin/gcc' failed with exit code 1 error: subprocess-exited-with-error

I used the following installation method, but received an error that has not been resolved for several days： git clone https://github.com/NVIDIA/apex cd apex pip install --global-option="--cpp_ext" --global-option="--cuda_ext" --no-cache -v --disable-pip-version-check...

训练奖励模型的时候，这两个参数对应的分别是什么模型呢？

### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction ``` CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \ --stage rm \ --do_train \ --model_name_or_path path_to_llama_model \ --adapter_name_or_path...

pending