Lev McKinney
Lev McKinney
> I don't understand this -- isn't the tensor shape and dtype the same in all cases? Why should the callee care what the type of the network is? Sorry...
The only other concrete use case I can think of at the moment would be reward ensembles. When doing training we might want to detect if we are using an...
It would also remove the need for most of the code in `serialize.py` since we would not need to treat different reward functions differently based on what wrappers they have....
> How should the ensemble + shaping work conceptually? I'd think we'd want to shape each ensemble member separately, and then compute the loss of member[i]+shaping[i] separately. Good point I...
Quick suggestion here. Rather than having a binary choice either `accumulate_grad=True` or `accumulate_grad=False`, we could have a setting like in pytorch lightning's `accumulate_grad_batches` see [here](https://pytorch-lightning.readthedocs.io/en/latest/advanced/training_tricks.html#accumulate-gradients). Say your GPU memory can...
I've attached the PDF giving the proofs for the update rule: [Incremental_batch_EMA_and_EMV.pdf](https://github.com/HumanCompatibleAI/imitation/files/9364938/Incremental_batch_EMA_and_EMV.pdf) The latex source code can be found [here](https://gist.github.com/levmckinney/fed0b4feb125ba201301dcf5298c1fe2)
> @levmckinney let me know if there's any particular areas you'd like my feedback on, I'll aim to try and do a high-level review but may not have time to...
> @levmckinney, what's outstanding here? I know you're busy now, but I can likely find someone else to finish the PR off. > > However, IIRC the policy ensemble didn't...
Currently, Open AI say that they don't train their models on API calls and have a 30-day data retention policy [source](https://openai.com/blog/introducing-chatgpt-and-whisper-apis). But that could change. My obsidian notes are about...
Not that I'm aware of but their should be support for the Gemma architecture in the next release see https://github.com/AlignmentResearch/tuned-lens/pull/125. If anyone does train one, I'm accepting PRs to https://huggingface.co/spaces/AlignmentResearch/tuned-lens/discussions....