Liu Dongxiao issues

Results 5 issues of


                                            Liu Dongxiao

关于预训练过程中build_instances 句子切分的问题

以MlmDataset 中最简单的字粒度为例，不开启full-sentence开关当样本长度超过max_length时候，样本被切分然而此时的 [CLS] [SEP] token 却只存在一份，这是由之前的 document 传入的，样本拆分后并没有产生额外的头尾 token 这种行为符合预期么，理论上每个单独的样本都应该具有一个 [CLS] 头 [SEP] 尾

没找到官方的预训练脚本，只用MLM做继续预训练效果如何？

缺少其他的预训练任务感觉会破坏模型效果

单机8卡A100-80G deepspeed ZERO3 或者非ZERO3 pretrain LLaMA-7B时，不能充分利用显卡

不用deepspeed会爆显存，有没有推荐的预训练参数设置，可以全程高效率的跑GPU

[BUG: Could not find consolidated.00.pth or consolidated.safetensors in Mistral model path but mistralai/Mistral-Large-Instruct-2407 surely not contains it

### Python -VV ```shell Python 3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0] ``` ### Pip Freeze ```shell accelerate==0.32.1 aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 argon2-cffi==23.1.0 argon2-cffi-bindings==21.2.0 arrow==1.3.0 asttokens==2.4.1 async-lru==2.0.4 async-timeout==4.0.3 attrs==23.2.0...

bug

what does `return_output_tensors` parameter do in inferflow_service.ini

And I set this argument to true, but not found PPL or logits in return Besides, what is this argument supposed to return? Should it return only the logits of...

Liu Dongxiao

关于预训练过程中build_instances 句子切分的问题

没找到官方的预训练脚本，只用MLM做继续预训练效果如何？

单机8卡A100-80G deepspeed ZERO3 或者 非ZERO3 pretrain LLaMA-7B时，不能充分利用显卡

[BUG: Could not find consolidated.00.pth or consolidated.safetensors in Mistral model path but mistralai/Mistral-Large-Instruct-2407 surely not contains it

what does `return_output_tensors` parameter do in inferflow_service.ini

单机8卡A100-80G deepspeed ZERO3 或者非ZERO3 pretrain LLaMA-7B时，不能充分利用显卡