Pramodith Ballapuram
Pramodith Ballapuram
@ChainYo I'd like to work on LongT5. Edit : Taking up Pegasus instead since there already seems to be an implementation for LongT5 :-D
Awesome! Thank you. I'll close the issue once the PR gets merged. :-)
Hi, I'm seeing the same problem, it seems like the Quantized Onnx version is faster than the pytorch model when I run it using a batch size of 1. However,...
@yundai424 @S1ro1 I'd like to help with this, but wanted to pin down some of the exact steps that can be taken to make the MoE layer more efficient. Per...
> @pramodith would you interested in moving this forward? maybe create a new PR based on the current main. We just have to ensure the precision is correct @ByronHsu done...
Hey folks, has anyone figured out the fix for this, I'm running into a similar issue.
Thanks, will try again.
@shivam15s @ByronHsu I think we should also consider including some of the loss functions commonly used for training embedding models, especially the popular ones supported in Sentence transformers. It's quite...
@ByronHsu most embedding models have a final Linear layer of shape (hidden_dim, hidden_dim), so vocab size doesn't really come into the picture for them so you're right to point it...
> Then i think chunk loss is still helpful given the large batch size Yes, I think so too. I can give this a try after we wrap up all...