cat-state

Results 18 issues of cat-state

TODO: make repo flake8 compatible

Currently: Implemented nemo GPT + ilql heads model and got it to run forward The model runs but crashes with this error on the first backward step: ``` │ /mnt/nvme/home/uwu/micromamba/envs/nemo/lib/python3.8/site-packages/apex/tra...

Self play, and generally multi-LM-agent settings are something we are very interested in exploring. What does it take to support this? Does it already work without big overheads?

feature request

### 🚀 The feature, motivation, and pitch We should try out [github.com/nVIDIA/neMo](https://github.com/nVIDIA/neMo), using NeMo-Megatron. This would involve - [ ] Make pytorch-lightning based trainer - [ ] Make NeMoILQLModel -...

### 🚀 The feature, motivation, and pitch If we had links to benchmarks for the example (and/or test) models, it would be easier to add new models, and keep track...

Do RLHF using Anthropic's HH data, using existing models. Depends on https://github.com/CarperAI/trlx/issues/25 (cc @haileyschoelkopf)

We should also add it as a pre-merge hook

This needs the current repo to be flake8 passing, which it doesn't

Measure gradient norms and gradient noise of RL training. This is to address open questions around scaling gradients when mixing RL and non-RL tasks, as well as informing optimal batch...

feature request

Create an example showing reward modeling. This could use a synthetic reward source artificially limited, or the HHH Anthropic data (already on the Stability cluster). More ideas for tasks: https://github.com/CarperAI/trlx/issues/13#issuecomment-1273632021...