fwtan comments

Results 5 comments of


                                            fwtan

Compatibility with PyTorch DataLoader

The Dataloader in pytorch prefetches a lot of videos: `prefetch_factor` x `num_workers` x `batch_size` (`prefetch_factor = 2` by default). These videos will be stored in GPU that consumes a lot...

How does AIMET ensure consistency with the DSP device after quantization?

Hi @quic-mangal , thanks for the reply! I wonder if you could provide an example configuration file for DSP/HTP? Thank you!

[Feature Request] run the LLM model on the Qualcomm Hexagon NPU in Android OS

In case this is of interest, we provide an example for deploying TinyLlaMA-1.1B-Chat on Qualcomm Hexagon NPU (SM8650): https://github.com/saic-fi/MobileQuant/tree/main/capp. However, our solution is pretty ad-hoc compared to MLC-LLM.

Does llama2 example on Android utilize HTP?

In case this is of interest, we provide an example for deploying TinyLlaMA-1.1B-Chat on HTP (SM8650): https://github.com/saic-fi/MobileQuant/tree/main/capp. However, the solution is pretty ad-hoc compared to executorch.

2-bit quantization

> Hi! Did you succeed to reproduce the numbers on the paper? I'm also having trouble with it, tried your code with llama2-7B and got nan in wikitext. I've reviewed...