Liu Dongxiao comments

Results 12 comments of


                                            Liu Dongxiao

数据预处理的时候报一下异常了

> 使用的是mlm目标任务，不应该使用bert的语料格式 > https://github.com/dbiir/UER-py/wiki/Quickstart > ![image](https://user-images.githubusercontent.com/31317254/115357686-7e1cbb80-a1ef-11eb-9f79-70eaf4586f25.png) > > `corpora/part-2021012611.txt`是已经用空格分开的语料么？如果是的，应该使用`--tokenizer space` > 如果预处理preprocess使用了`--dynamic_masking`，应该在预训练pretrain阶段指定`--span_masking` 正式文档里面的模型使用说明里面的 example 没有说明 "如果预处理preprocess使用了`--dynamic_masking`，应该在预训练pretrain阶段指定`--span_masking`" 这一点, 除此之外pretrain的代码里面也没有单独的--dynamic_masking参数，请问是不是可以更新一下文档

netflix 4k

> > When will 4k movies be supported? At this point when I try to play a 4k movie it is only played as 1080p. > > Try this (...

[BUG]RuntimeError: output tensor must have the same type as input tensor

I resolved the issue by modifying the Trainer arguments from --bf16 to --fp16. I'm currently utilizing the combination of PyTorch 2.0 and Deepspeed. However, I've only come across this problem...

issue区全跟要饭的一样，丢国人的脸

> 中科大清高啊，关注你了哦，以后你所有代码全部从底层开始写起，不准参考，不然开口就是“要饭”，然后就是“丢国人的脸” 国科大和中科大不是一所, 中科大是USTC

[BUG: Could not find consolidated.00.pth or consolidated.safetensors in Mistral model path but mistralai/Mistral-Large-Instruct-2407 surely not contains it

> I have encountered the same problem. You can directly using vllm for inference, I find it compatibale with Mistral-Large-2

[BUG: Could not find consolidated.00.pth or consolidated.safetensors in Mistral model path but mistralai/Mistral-Large-Instruct-2407 surely not contains it

@liuanping @shangh1 @endNone all my package version are listed above, as for vllm, that is vllm==0.5.2 inference code is quite simple , I'm using 4*H100 for mistral-large-2 ``` from vllm...

[BUG]torch._C._LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 18163 is not positive-definite).

> @SuperBruceJia This is auto-fixed, as much as possible, in [GPTQModel](https://github.com/ModelCloud/GPTQModel). We will retry the quantization steps to self-heal this issue. Could you please give some instruction on how to...

128卡 A800 80G qwen2 7b cut_off 8192报错oom

> model_name_or_path: /mnt/nas/shanzhi/eval_models/Qwen2-72B > > 为啥你这里写的是72B呢？ 2k短上下文其实3机 8*80G 就能训练了,他这个128卡 16机肯定哪里有问题

128卡 A800 80G qwen2 7b cut_off 8192报错oom

> > > model_name_or_path: /mnt/nas/shanzhi/eval_models/Qwen2-72B > > > 为啥你这里写的是72B呢？ > > > > > > 2k短上下文其实3机 8*80G 就能训练了,他这个128卡 16机肯定哪里有问题 > > 我感觉他是不是搞错了，他说的7b，但是看他配置写的是72b，是不是跑去训练72b了。而且我感觉128卡 A800 针对这种参数的模型秒天秒地了吧，72b也不至于跑不了。嗯嗯, 3机8卡a800 就能跑qwen 72B了

128卡 A800 80G qwen2 7b cut_off 8192报错oom

> @ShadowTeamCN 想咨询一下，双机8卡启动脚本怎么写的呀，这样就可以吗 > > FORCE_TORCHRUN=1 NNODES=2 RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft_ds3.yaml FORCE_TORCHRUN=1 NNODES=2 RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft_ds3.yaml 没啥问题, 确保下 ip地址可访问, 防火墙端口打开, 不过192.168.0.1 这个地址一般是路由器网关地址, 你得在自个机器上用ifconfig确认下是不是你机器的局域网地址