toufunao

Results 8 issues of toufunao

**Describe the issue**: I want to run DARTS examples in multiple GPUs, so I wrapped the model with DDP and shared data with Distributedsampler. However, I found the final 2...

**Describe the issue**: When I tried to use NAS, it reported 'ImportError: Cannot use a path to identify something from __main__.' and 'ValueError: Pickle too large when trying to dump...

In hetero-LR settings, a role arbiter is needed. I would like to know which party should I assign this role. In the tutorial, some choose the guest to be the...

**Describe the issue**: I used 2 GPUs to train DARTS, but from the output, I find that I get 2 different results. And I used 'export_onnx', but I didn't get...

NAS 2.0

**问题描述 / Problem Description** 纯内网环境安装pycocotools依赖失败 **环境信息 / Environment Information** 操作系统:红帽商业版7.7 python:3.10.9 **附加信息 / Additional Information** 添加与问题相关的任何其他信息 / Add any other information related to the issue.

bug

when i tried codellama-7b and codellama-34b to test code completion, all results were garbled code. facilities: OS: Red hat 4.8.5-36 GCC:4.8.5 32G V100 cuda:11.7 torch: 2.0.0 fairscale 0.4.13 sentencepiece: 0.1.99...

I used tensor_parallel to finetune qwen model with lora in tensor parallel way. However, it cannot save the model in the end. Any help can you provide? Thanks.

我使用了以下脚本进行训练,数据集大小约为33000条数据,per_device_batch_size=16,gradient_accumenlation_steps=32,epochs=3,4张GPU。 nproc_per_node=4 NPROC_PER_NODE=$nproc_per_node \ CUDA_VISIBLE_DEVICES=0,1,2,3 \ swift pt \ --model Qwen/Qwen2.5-7B \ --train_type full \ --dataset $CUSTOM_DATASET \ --torch_dtype bfloat16 \ --num_train_epochs 3 \ --per_device_train_batch_size 16 \ --per_device_eval_batch_size 1 \...