cat-state
cat-state
TODO: make repo flake8 compatible
Currently: Implemented nemo GPT + ilql heads model and got it to run forward The model runs but crashes with this error on the first backward step: ``` │ /mnt/nvme/home/uwu/micromamba/envs/nemo/lib/python3.8/site-packages/apex/tra...
Self play, and generally multi-LM-agent settings are something we are very interested in exploring. What does it take to support this? Does it already work without big overheads?
### 🚀 The feature, motivation, and pitch We should try out [github.com/nVIDIA/neMo](https://github.com/nVIDIA/neMo), using NeMo-Megatron. This would involve - [ ] Make pytorch-lightning based trainer - [ ] Make NeMoILQLModel -...
### 🚀 The feature, motivation, and pitch If we had links to benchmarks for the example (and/or test) models, it would be easier to add new models, and keep track...
Do RLHF using Anthropic's HH data, using existing models. Depends on https://github.com/CarperAI/trlx/issues/25 (cc @haileyschoelkopf)
We should also add it as a pre-merge hook
This needs the current repo to be flake8 passing, which it doesn't
Measure gradient norms and gradient noise of RL training. This is to address open questions around scaling gradients when mixing RL and non-RL tasks, as well as informing optimal batch...
Create an example showing reward modeling. This could use a synthetic reward source artificially limited, or the HHH Anthropic data (already on the Stability cluster). More ideas for tasks: https://github.com/CarperAI/trlx/issues/13#issuecomment-1273632021...