Ajinkya Tejankar
Ajinkya Tejankar
Hi, I am interested in reproducing the results for VisDa-17 benchmark. I couldn't find any instructions on how to do this in the repository.
1. Replace `.cuda()` calls with `.to()` calls and correct device in `eval_imagenet` function 2. Add `eval_winoground` to evaluate on the Winoground dataset
# (WIP) Fix for the LM_HEAD issue **Root Cause**. The error is caused by incorrect segments passed to the `lora_b_sgmv` kernel during the prefill stage. This happens because we do...
Using `--compile` option on the main branch is currently broken. I've fixed the first issue, but this just leads to the next issue, which we need to debug, and changes...
This should prevent some nasty illegal memory access errors 1. Consolidate individual list comprehensions into a single for loop 2. Distinct code to create the lora weight pointers tensor 3....
# What does this PR do? 1. Re-organize the code in BatchLoraWeights.load. This function was a bit hard to understand as there were multiple list comprehensions with almost same looping...
Prefill uses flash attention 2 kernels so attention happens in FP16, but KV tensors are quantized to FP8 before storing them in KV Cache. Uses static scales calculated using the...
THIS SHOULD BE CLOSED