Lev McKinney comments

Results 16 comments of


                                            Lev McKinney

RewardNet refactor

> I don't understand this -- isn't the tensor shape and dtype the same in all cases? Why should the callee care what the type of the network is? Sorry...

RewardNet refactor

The only other concrete use case I can think of at the moment would be reward ensembles. When doing training we might want to detect if we are using an...

RewardNet refactor

It would also remove the need for most of the code in `serialize.py` since we would not need to treat different reward functions differently based on what wrappers they have....

RewardNet refactor

> How should the ensemble + shaping work conceptually? I'd think we'd want to shape each ensemble member separately, and then compute the loss of member[i]+shaping[i] separately. Good point I...

[Feature request] Gradient accumulation in CrossEntropyRewardTrainer in preference_comparisons.py

Quick suggestion here. Rather than having a binary choice either `accumulate_grad=True` or `accumulate_grad=False`, we could have a setting like in pytorch lightning's `accumulate_grad_batches` see [here](https://pytorch-lightning.readthedocs.io/en/latest/advanced/training_tricks.html#accumulate-gradients). Say your GPU memory can...

Optimize EMANorm by removing for loop over a batch

I've attached the PDF giving the proofs for the update rule: [Incremental_batch_EMA_and_EMV.pdf](https://github.com/HumanCompatibleAI/imitation/files/9364938/Incremental_batch_EMA_and_EMV.pdf) The latex source code can be found [here](https://gist.github.com/levmckinney/fed0b4feb125ba201301dcf5298c1fe2)

Implemented the ability to train rewards in preference comparison against multiple policies

> @levmckinney let me know if there's any particular areas you'd like my feedback on, I'll aim to try and do a high-level review but may not have time to...

Implemented the ability to train rewards in preference comparison against multiple policies

> @levmckinney, what's outstanding here? I know you're busy now, but I can likely find someone else to finish the PR off. > > However, IIRC the policy ensemble didn't...

FR: Allow use of local whisper instance

Currently, Open AI say that they don't train their models on API calls and have a 30-day data retention policy [source](https://openai.com/blog/introducing-chatgpt-and-whisper-apis). But that could change. My obsidian notes are about...

Has anyone trained a tuned lens on Gemma-2b or other Gemma models?

Not that I'm aware of but their should be support for the Gemma architecture in the next release see https://github.com/AlignmentResearch/tuned-lens/pull/125. If anyone does train one, I'm accepting PRs to https://huggingface.co/spaces/AlignmentResearch/tuned-lens/discussions....