cat-state issues

Results 18 issues of


                                            cat-state

Currently: Implemented nemo GPT + ilql heads model and got it to run forward The model runs but crashes with this error on the first backward step: ``` │ /mnt/nvme/home/uwu/micromamba/envs/nemo/lib/python3.8/site-packages/apex/tra...

Self Play

Self play, and generally multi-LM-agent settings are something we are very interested in exploring. What does it take to support this? Does it already work without big overheads?

feature request

NeMo-Megatron Integration

### 🚀 The feature, motivation, and pitch We should try out [github.com/nVIDIA/neMo](https://github.com/nVIDIA/neMo), using NeMo-Megatron. This would involve - [ ] Make pytorch-lightning based trainer - [ ] Make NeMoILQLModel -...

Example/Test Model Benchmarks (Canonical WandB runs)

### 🚀 The feature, motivation, and pitch If we had links to benchmarks for the example (and/or test) models, it would be easier to add new models, and keep track...

RLHF with HH Anthropic data

Do RLHF using Anthropic's HH data, using existing models. Depends on https://github.com/CarperAI/trlx/issues/25 (cc @haileyschoelkopf)

fix type errors and add mypy

We should also add it as a pre-merge hook

Add flake8 and isort to pre-commit

This needs the current repo to be flake8 passing, which it doesn't

Gradient metrics

Measure gradient norms and gradient noise of RL training. This is to address open questions around scaling gradients when mixing RL and non-RL tasks, as well as informing optimal batch...

feature request

Learnt Reward Modelling example

Create an example showing reward modeling. This could use a synthetic reward source artificially limited, or the HHH Anthropic data (already on the Stability cluster). More ideas for tasks: https://github.com/CarperAI/trlx/issues/13#issuecomment-1273632021...

cat-state

Add isort flake8

NeMo Integrate WIP