Onebula
Onebula
The expire function should support update expire time when ttl is specified, and also support to move the key to the end of the cache stack when ttl is not...
> wandb 是会记录启动参数的 只有huggingface的trainer参数会记录,deepspeed/accelerate和本项目新增的参数都没有记录
I also found this problem. After all loglikelihood requests are finished, the process hangs with no other outputs and CPU/GPU are full. Mistral-7B-v0.1 on MMLU with auto:4 meets this problem,...
Or any experiments demonstrate SPIN is superior to DPO in self-iteration? The most relevant experiment only run DPO once while SPIN with multiple iterations.
> why not combine dpo and spin? Put the previous generation into the rejected column and the new generation into the accepted one. Then train with DPO at each iteration....