Nathan Price

Results 28 comments of Nathan Price

@byshiue I believe that alpha scaling is expected to be performed on the weights which are uploaded. Digging into the underlying code used by the `examples/run.py` code I found that...

I saw similar results with llama3. Mine was resolved when I disabled 'use_custom_all_reduce' in compilation

Curious to get any feedback here This update is also related to a performance issue I am seeing. https://github.com/NVIDIA/TensorRT-LLM/issues/1957 This PR gets results much closer to the expected outputs but...

Any updates?! I see a new issue that looks the same as well but in my case I have now tried with the 24.07 tag and the results are the...

> In the bug description, I did not see which LoRA was used. could you please tell me ? It's better to offer the huggingface link of the base model...

Any insights gained from knowing that alpha*A != alpha*B When scaling the weights?

> [@TheCodeWrangler](https://github.com/TheCodeWrangler) any updates on this? I actually was blocked on this for a deployment I needed. I ended up changing base frameworks to `vllm` in order to move forward...

I think for reproducing the issue: Get any weights which were trained and apply the alpha value to the A matrix and then retry with applying them to the B...

Have you tried `nvidia-smi topo -p2p r` To inspect if the drivers for your GPUS are installed and support the peer to peer access? Also I have encounterd similar issues...