Pramodith Ballapuram

Results 20 comments of Pramodith Ballapuram

@ChainYo I'd like to work on LongT5. Edit : Taking up Pegasus instead since there already seems to be an implementation for LongT5 :-D

Awesome! Thank you. I'll close the issue once the PR gets merged. :-)

Hi, I'm seeing the same problem, it seems like the Quantized Onnx version is faster than the pytorch model when I run it using a batch size of 1. However,...

@yundai424 @S1ro1 I'd like to help with this, but wanted to pin down some of the exact steps that can be taken to make the MoE layer more efficient. Per...

> @pramodith would you interested in moving this forward? maybe create a new PR based on the current main. We just have to ensure the precision is correct @ByronHsu done...

Hey folks, has anyone figured out the fix for this, I'm running into a similar issue.

@shivam15s @ByronHsu I think we should also consider including some of the loss functions commonly used for training embedding models, especially the popular ones supported in Sentence transformers. It's quite...

@ByronHsu most embedding models have a final Linear layer of shape (hidden_dim, hidden_dim), so vocab size doesn't really come into the picture for them so you're right to point it...

> Then i think chunk loss is still helpful given the large batch size Yes, I think so too. I can give this a try after we wrap up all...