Pramodith Ballapuram comments

Results 20 comments of


                                            Pramodith Ballapuram

ONNXConfig: Add a configuration for all available models

@ChainYo I'd like to work on LongT5. Edit : Taking up Pegasus instead since there already seems to be an implementation for LongT5 :-D

map() function removes columns when input_columns is not None

Awesome! Thank you. I'll close the issue once the PR gets merged. :-)

OnnxT5 slower than Pytorch

Hi, I'm seeing the same problem, it seems like the Quantized Onnx version is faster than the pytorch model when I run it using a batch size of 1. However,...

MoE kernel

@yundai424 @S1ro1 I'd like to help with this, but wanted to pin down some of the exact steps that can be taken to make the MoE layer more efficient. Per...

fused_linear_cross_entropy: Move float32 cast into kernel

> @pramodith would you interested in moving this forward? maybe create a new PR based on the current main. We just have to ensure the precision is correct @ByronHsu done...

[Bug] AttributeError: 'LlamaForCausalLM' object has no attribute 'vllm_engine'

Hey folks, has anyone figured out the fix for this, I'm running into a similar issue.

[Bug] AttributeError: 'LlamaForCausalLM' object has no attribute 'vllm_engine'

Thanks, will try again.

[RFC] Liger FlexChunkLoss: Alignment and Distillation loss

@shivam15s @ByronHsu I think we should also consider including some of the loss functions commonly used for training embedding models, especially the popular ones supported in Sentence transformers. It's quite...

[RFC] Liger FlexChunkLoss: Alignment and Distillation loss

@ByronHsu most embedding models have a final Linear layer of shape (hidden_dim, hidden_dim), so vocab size doesn't really come into the picture for them so you're right to point it...

[RFC] Liger FlexChunkLoss: Alignment and Distillation loss

> Then i think chunk loss is still helpful given the large batch size Yes, I think so too. I can give this a try after we wrap up all...