Alex Havrilla issues

Results 11 issues of


                                            Alex Havrilla

caffe2 error in forward method when using fsdp

### System Info ```Shell - `Accelerate` version: 0.11.0 - Platform: Linux-5.10.112-108.499.amzn2.x86_64-x86_64-with-glibc2.2.5 - Python version: 3.8.5 - Numpy version: 1.23.1 - PyTorch version (GPU?): 1.12.0+cu113 (True) - `Accelerate` default config: -...

bug

Ppo z3

Work in progress integrating zero3 with hydra models for ppo. Current implementation works for models < 6B but OOMs on 6B.

Implement A2C

Implement additional online RL algorithms

feature request

Add Jax support

### 🚀 The feature, motivation, and pitch Add jax support for RLHF on TPUs. ### Alternatives _No response_ ### Additional context _No response_

feature request

Support direct loading into rollout storage for reward labeled datasets

Carp config requires device

Carp config requires a device which needs to be changed for multi-gpu training

Implement BoN for training and eval

Inference pipeline

Implementation of multi-generation RL in trlX Suggested (but optional) external inference pipeline wrapper can be found[ here](https://github.com/CarperAI/autocrit/pull/16)

Dist ref kl

Implementing `ref_model` as an additional reward component

Implement Asynchronous PPO

### 🚀 The feature, motivation, and pitch Implementing an asynchronous PPO mitigates model rollout/exploration as the largest bottleneck in the training process. ### Alternatives _No response_ ### Additional context _No...

feature request