Results 5 comments of fwtan

The Dataloader in pytorch prefetches a lot of videos: `prefetch_factor` x `num_workers` x `batch_size` (`prefetch_factor = 2` by default). These videos will be stored in GPU that consumes a lot...

Hi @quic-mangal , thanks for the reply! I wonder if you could provide an example configuration file for DSP/HTP? Thank you!

In case this is of interest, we provide an example for deploying TinyLlaMA-1.1B-Chat on Qualcomm Hexagon NPU (SM8650): https://github.com/saic-fi/MobileQuant/tree/main/capp. However, our solution is pretty ad-hoc compared to MLC-LLM.

In case this is of interest, we provide an example for deploying TinyLlaMA-1.1B-Chat on HTP (SM8650): https://github.com/saic-fi/MobileQuant/tree/main/capp. However, the solution is pretty ad-hoc compared to executorch.

> Hi! Did you succeed to reproduce the numbers on the paper? I'm also having trouble with it, tried your code with llama2-7B and got nan in wikitext. I've reviewed...