Alex Havrilla
Alex Havrilla
Thanks! I can run generate in accelerate with this setup after changing the wrapping policy: ``` compute_environment: LOCAL_MACHINE deepspeed_config: {} distributed_type: FSDP fsdp_config: fsdp_auto_wrap_policy: SIZE_BASED_WRAP fsdp_backward_prefetch_policy: BACKWARD_PRE min_num_params: 2000 offload_params:...
Any update on this?
@ethankim00 just a gentle push on when you expect to finish this?
Closing this.
From the PPO perspective each token receives a reward given by the kl-distance of the fine-tuned model to the reference model plus the score provided by the user provided `reward_fn`...
Let us know if you run into any issues here or on the trlx channel.
# Ideas for tasks - web searching: wikipedia race - chess - A chess DT is not trained on natural language, it’s trained on a formal language encoding chess moves....
Depends on #529
@glerzing Do you have an example run using 8bit?
@PhungVanDuy If you have time can you help to debug this? I think having lower precision inference and training options will be very useful.