Android demo app poor model performance
🐛 Describe the bug
I wanted to try the new Llama 3.2 1B parameter model on mobile. I downloaded the model and generated the pte like so:
python torchchat.py download llama3.2-1b
python torchchat.py export llama3.2-1b --quantize torchchat/quant_config/mobile.json --output-pte-path llama3_2-1b.pte
Then I pushed llama3_2-1b.pte file and tokenizer.model files to the mobile phone using adb.
I executed the demo app in torchchat/edge/android/torchchat using Android Studio with .aar file provided on the TorchChat repo readme.
However, when I chat with the AI its responses are very useless and feel quite different than what I get with the same prompt on my computer:
Is there a problem with the default quantization parameters? I tried to not quantize but then the app crashed when loading the model.
Versions
Collecting environment information... PyTorch version: 2.5.0.dev20240901 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A
OS: macOS 14.4 (arm64) GCC version: Could not collect Clang version: 15.0.0 (clang-1500.3.9.4) CMake version: version 3.30.4 Libc version: N/A
Python version: 3.10.0 (default, Mar 3 2022, 03:54:28) [Clang 12.0.0 ] (64-bit runtime) Python platform: macOS-14.4-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Apple M2 Pro
Versions of relevant libraries: [pip3] executorch==0.5.0a0+286799c [pip3] numpy==1.26.4 [pip3] torch==2.5.0.dev20240901 [pip3] torchao==0.5.0+git0916b5b [pip3] torchaudio==2.5.0.dev20240901 [pip3] torchsr==1.0.4 [pip3] torchtune==0.3.0.dev20240928+cpu [pip3] torchvision==0.20.0.dev20240901 [conda] executorch 0.5.0a0+286799c pypi_0 pypi [conda] numpy 1.26.4 pypi_0 pypi [conda] torch 2.5.0.dev20240901 pypi_0 pypi [conda] torchaudio 2.5.0.dev20240901 pypi_0 pypi [conda] torchsr 1.0.4 pypi_0 pypi [conda] torchtune 0.3.0.dev20240928+cpu pypi_0 pypi [conda] torchvision 0.20.0.dev20240901 pypi_0 pypi
This looks like the type of bug that occurs when we aren't including the proper EOS/BOS and role headers to the messages. The model's trying to "autocomplete" your message rather than "chat" with you.
cc. @kirklandsign can you confirm the header formatting is correct for LLaMA3-type models?
Hi @fran-aubry @vmpuri the app is not updated and doesn't have modes like instruct and doesn't handle EOS/BOS. We need to update to use the same one as ET if we need to handle that
I'm working on a tutorial teaching people how to set-up Llama 3.2 1B on their mobile phone. I thought torchchat would be the easiest way to go.
Will this be implemented or should I look for another way?
cc @Jack-Khuu @vmpuri should we update the app?
Yup, we should update the app. Should be relatively low lift (we already did it locally with Mengwei, just need to push and test)
@fran-aubry Thanks for your interest and patience, we'll have something up soon (just missing the string template in the app)
@Jack-Khuu thank you so much. Let me know when it's ready, please :)
I don't have a device on me to test, but something along the lines of https://github.com/pytorch/torchchat/pull/1284 should do the trick
Heads up @fran-aubry we're talking with the ExecuTorch folk to pull in their demo app that they showed at Connect and PyTorch Conference
Will keep you posted, should be a really cool face lift
We will update the app in https://github.com/pytorch/torchchat/pull/1292
App was updated in weeks past, please let us know if you encounter anything else