Results 6 comments of LEON

兄弟加个微信,一块研究~

我这么实现的原因在于很多情况下我并不是从头开始训练的,我是基于别人微调过的模型或者预训练的模型做进一步的训练,那我又不能修改原本的模型路径,也就是我需要去创建一个新的路径来存放我的checkpoint,还有就是如果 resume_from_checkpoint=True 的话 checkpoint 应该是会保存在本地的,但比如公司里面很常见的场景是起的任务是一个暂时的环境,所以肯定是不能把 checkpoint 保存在环境本地的,需要指定到一个hdfs 或者 nas 地址,这样即便这个任务被 kill 了但 checkpoint 还在。

> When running run_multienv.py, even if the task is executed correctly, the reward will always be 0. I have tried with num_envs set to 4 and 8, but none of...

It's fine now. It's a misunderstanding rather than a bug. Thank you for your effort!

Same issue, I found this bug in a previous released version. The root cause is a bug in unsloth_zoo/compiler.py. It's easy to fix it with only one line code. You...

@danielhanchen Thanks for this bug-fix!!! Could you also please check the issue #1887 . It's about using the vllm backend for inference. However, the inference results are not in the...