Is there a plan for a pytorch wrapper?
I don't think I'll have time to do this myself, but someone else is welcome to. Nvidia is likely also looking to formalize blocksparse primitives in their libraries as well.
Hello, @scott-gray I would like to try to port it to pytorch. I know the procedure of incorporating c++ and cuda extension in pytorch through setuptools.
Any advice on what op to start with and how should I go about testing and validating that everything works as envisioned ? I only have a colab instance which should be good enough, but something that trains quickly would be appreciated (small dataset, model).
Thanks !
Hey @karanchahal, your efforts to help port these ops to pytorch would be more than welcome (both internally here at OpenAI and likely by the wider community). I think right now the demand is most high for the sparse transformer primitives. I'd look here to find anything not yet supported in pytorch:
https://github.com/openai/blocksparse/blob/master/examples/transformer/enwik8.py
You might also look at the layer_norm implementation. I'm pretty sure it's significantly faster than anything else I've seen out there. Also, my clip_by_global_norm and fused optimizer ops make training in fp16 rather easy.
Thanks ! I'll look into it
Over the weekend, @soumith put together this PR that adds support for one of our ops: https://github.com/soumith/blocksparse/commit/4071232a4a73a441424434ca2e81b1e4fd4e836c
We should be able to follow this example to add the other ops as well. Thanks @soumith!
Is there an update on this issue?
We have some pytorch coverage of the ops, work is ongoing to make it more complete. We'll release this code sometime soon (along with new ops/modes).
Is there an update on this issue?
Is there an update on this issue? Thanks!