Alex Havrilla

Results 17 comments of Alex Havrilla

Thanks! I can run generate in accelerate with this setup after changing the wrapping policy: ``` compute_environment: LOCAL_MACHINE deepspeed_config: {} distributed_type: FSDP fsdp_config: fsdp_auto_wrap_policy: SIZE_BASED_WRAP fsdp_backward_prefetch_policy: BACKWARD_PRE min_num_params: 2000 offload_params:...

@ethankim00 just a gentle push on when you expect to finish this?

From the PPO perspective each token receives a reward given by the kl-distance of the fine-tuned model to the reference model plus the score provided by the user provided `reward_fn`...

Let us know if you run into any issues here or on the trlx channel.

# Ideas for tasks - web searching: wikipedia race - chess - A chess DT is not trained on natural language, it’s trained on a formal language encoding chess moves....

Depends on #529

@glerzing Do you have an example run using 8bit?

@PhungVanDuy If you have time can you help to debug this? I think having lower precision inference and training options will be very useful.