Daniel Tiarks
Daniel Tiarks
Also very much interested!
The interesting part is probably the implementation of ring attention
As far as I understood it ring attention is an efficient way to compute attention in many GPUs
Out of curiosity @NielsRogge : did you ever use your implementation to fine tune it on a task like CORD?
@plamb-viso My impression was always that tracing Encoder-Decoder models (e.g. BART) works fine but exporting to ONNX is challenging using jit.trace. There's a research example for BART on how to...
@ArthurZucker does https://github.com/huggingface/transformers/pull/24565 fix the remaining issues of this PR?
Ok. The question is how we can move this PR forward? @plamb-viso, @Jordy-VL, I (and probably others) are still definitely interested in this. @NielsRogge are you aware of other issues...
No worries @ArthurZucker ☺️. My comment was not meant to push anyone. I was just interested if I could contribute to speed up the process.