fwtan
fwtan
The Dataloader in pytorch prefetches a lot of videos: `prefetch_factor` x `num_workers` x `batch_size` (`prefetch_factor = 2` by default). These videos will be stored in GPU that consumes a lot...
Hi @quic-mangal , thanks for the reply! I wonder if you could provide an example configuration file for DSP/HTP? Thank you!
In case this is of interest, we provide an example for deploying TinyLlaMA-1.1B-Chat on Qualcomm Hexagon NPU (SM8650): https://github.com/saic-fi/MobileQuant/tree/main/capp. However, our solution is pretty ad-hoc compared to MLC-LLM.
In case this is of interest, we provide an example for deploying TinyLlaMA-1.1B-Chat on HTP (SM8650): https://github.com/saic-fi/MobileQuant/tree/main/capp. However, the solution is pretty ad-hoc compared to executorch.
> Hi! Did you succeed to reproduce the numbers on the paper? I'm also having trouble with it, tried your code with llama2-7B and got nan in wikitext. I've reviewed...