Ajinkya Tejankar issues

Results 11 issues of


                                            Ajinkya Tejankar

Reproducing the VisDa-17 results

Hi, I am interested in reproducing the results for VisDa-17 benchmark. I couldn't find any instructions on how to do this in the repository.

Fix eval_imagenet and add eval_winoground

1. Replace `.cuda()` calls with `.to()` calls and correct device in `eval_imagenet` function 2. Add `eval_winoground` to evaluate on the Winoground dataset

Fix for the LM_HEAD issue

# (WIP) Fix for the LM_HEAD issue **Root Cause**. The error is caused by incorrect segments passed to the `lora_b_sgmv` kernel during the prefill stage. This happens because we do...

Fix `--compile`

Using `--compile` option on the main branch is currently broken. I've fixed the first issue, but this just leads to the next issue, which we need to debug, and changes...

Refactor the lora load function for clarity and simplicity

This should prevent some nasty illegal memory access errors 1. Consolidate individual list comprehensions into a single for loop 2. Distinct code to create the lora weight pointers tensor 3....

(WIP) Support targeting the embedding layer for LoRA

# What does this PR do? 1. Re-organize the code in BatchLoraWeights.load. This function was a bit hard to understand as there were multiple list comprehensions with almost same looping...

Support FP8 KV Cache

Prefill uses flash attention 2 kernels so attention happens in FP16, but KV tensors are quantized to FP8 before storing them in KV Cache. Uses static scales calculated using the...

Add support for flash decoding

Dynamic scaling + Other stuff (DONT MERGE - SPLIT INTO SMALLER PRS)

THIS SHOULD BE CLOSED