Results 91 comments of Kai Lv

Hi, 我做的尝试如下: ![image](https://github.com/sail-sg/Adan/assets/39761308/59b9f480-56bc-4ace-a0d2-226b54c30ad8) 按照你建议的效果确实有提升。请问还有没有进一步的建议?

Our experiments focus on fine-tuning large language models on consumer GPUs like RTX 3090. The testing under these conditions is indeed as stated (using the 7b model on RTX 3090...

Based on the information provided, we consider this issue resolved. If you have any further questions or concerns, please reopen this issue and provide additional details.

Hi, the model weights should be saved in files like `pytorch_model.bin` with `CheckpointCallback` below. ```python callbacks = [CheckpointCallback(your_path, every_n_batches=1600, model_only=False,peft_only=False)] ``` BTW, are you using the `main` branch or `dev`...

这样每个rank可能因为先后顺序,导致获得的master_port不一样。可以像torchrun一样直接报错终止程序,并提示用户修改环境变量。

> 用bind会出现 将可用的端口判为不可用的情况 比如按照提示export了新的端口 但是下一次用的时候还会检测到port used 为False 改为connect就没这个问题 > […](#) > ---Original--- From: "Kai ***@***.***> Date: Thu, May 9, 2024 11:26 AM To: ***@***.***>; Cc: "Yitian ***@***.******@***.***>; Subject: Re: [OpenMOSS/CoLLiE]...

> Did you find a solution for this issue for prompt-tuning? Unfortunately no...

> Do you have any updates on this issue? @KaiLv69 Nope. Turn to LoRA :)

@sasaadi @nanyyyyyy @fateme-hshm96 @hepengfe I found that the error is due to the lack of consideration with zero3. Here is the solution. https://github.com/OpenLMLab/collie/pull/54/commits/7fcf9317d70f48429bee3936c080b88e2de4f99a In short, when finding word_embeddings using zero3,...

Feel free to reopen this issue if any further problems.