llama.cpp acts too dumb while running on phone!!
I was trying llama.cpp on phone with termux installed. but look at this image

Specifications The phone has 8 gigs of RAM and 7 gigs is free and the CPU has 8 cores so its not the issue of the RAM and CPU. Model used: alpaca-7B-lora llama.cpp version: latest prompt: chat-with-bob.txt
I really don't know what is causing the issue here. The problem happening is, when i ask a question to it, it just either answers the question in a very dumb way or it just repeats the same question not answering anything. With the same model, prompt and llama.cpp version on my PC with 4GB ram works as expected it answers every question with almost 98% accuracy. Can any of you guys help me out with this? or update the llama.cpp and fix the mobile issues please?
Thankyou
try playing with settings, like increasing temperature
Okay ill try it and see then ill let you know
@BarfingLemurs Nope it doesnt work, it stays the same (acts way way dumber and still repeats the question) @gjmulder Please add a bug or some issue label here to this please. It needs some more development.
You may want to log an issue with the Stanford Alpaca. It is the training set fo the Alpaca model you are using.
@gjmulder im using alpaca lora maybe thats the issue? I mean its not the problem of the alpaca model tho it worked fine on my laptop and for many users on their PCs on mobile it malfunctions
Maybe because of termux. I don't even know how to use the Firefox in termux to watching YouTube 720p without crash😅😂 I guess it's the ram restrictions from your phone to the termux app.
@FNsi lol its not due to termux. The same issue happens with userland (an app thats intended to run ubuntu on android without root)
@FNsi lol its not due to termux. The same issue happens with userland (an app thats intended to run ubuntu on android without root)
I think it's almost the same since there all the emulated terminal? I saw an android fork in Google market,maybe you can try it?
https://github.com/ggerganov/llama.cpp/discussions/750
The same model should work the similarly with llama.cpp on any platform, if the same temp, top_k, etc. parameters are being passed to llama.cpp. The random number generator is different so you will never get the exact same output, but the outputs should be simular in quality. The only other other difference I could imagine would be performance.
There are various optimized code paths that are only enabled for certain platform and feature sets, there could be differences in the implementation of those.
Could you post the initial output with the system_info and model parameters?
@unbounded the output i get is in the start of the issue (there is a screenshot of what the model is saying) Model parameters are the same as the chat.sh file in the repository's example directory.
System Info Arm cortex A53 octa core processor 8 GB RAM Android 12 and there is no AVX or AVX2 Flags in the CPU as its an ARM processor.
Could be related to #876 which was fixed in https://github.com/ggerganov/llama.cpp/commit/684da25926e5c505f725b4f10b5485b218fa1fc7
Closing as assumed fixed by https://github.com/ggerganov/llama.cpp/commit/684da25926e5c505f725b4f10b5485b218fa1fc7 , feel free to reopen if this still happens with the latest version.