linux kernel crash after a lengthy chat
LocalAI version: latest-aio-cpu docker container 2.22.1 at this time
Environment, CPU architecture, OS, and Version: Linux Zana 6.1.106-Unraid #1 SMP PREEMPT_DYNAMIC Wed Aug 21 23:36:07 PDT 2024 x86_64 11th Gen Intel(R) Core(TM) i5-11400 @ 2.60GHz GenuineIntel GNU/Linux 64 GB of RAM
Describe the bug after a while of doing some roleplay chatting with the system prompt set to a character description the linux kernel will simply crash included are the stacktrace pictures i took
To Reproduce simply load up the webui choose a roleplay model enter a character description as a prompt and keep chatting until crash
Expected behavior <really? -_-
Logs i would but my problem prevents this kernel stacktraces are included
Additional context
this problem is so far reproducible on stheno and lewdplay(with the occasional anti horni bonk it's a good roleplay model )
i have also reproduced this problem with big-agi as frontend which seems to increase the likelihood of crashes but the default webui also exhibits this behaviour .
have you watched your temps? sounds like overheating to me.
That was my first thought as well but that would cause a system reset not a full out kernel panic. Besides it's fine with 8 hours of transcoding video so I can reasonably rule out temperature
i also suspected memory errors since i'm using the cpu and llama-cpp is very memory intensive but that passed an 12 hour memcheck so thats fine as well
im thinking mismatch between input and available context length, i used to have a similar fault when i built my 'big-rag' poc
hmm maybe still weird that could cause a kernel crash
just a thought, but, have you tried a different browser? maybe the output caused a buffer overrun in the browser, and it accidentally executed it as code.
could be but the problem occurs on both the default webui en when i use big-agi as a frontend . i use firefox btw
next question: Are you using mmap of the model file(s)
i'm not quite sure what you're asking. i'm just using the localai webui
i'm not quite sure what you're asking. i'm just using the localai webui
look in the model's .yaml file for "mmap: true".
it means that the model is being "streamed" instead of loaded 100% into memory.
lewdplay has mmap set to true
lewdplay has mmap set to true
okay, then.. I would say to try redownloading the model file. it may have gotten corrupted, and you are hitting it once it hits enough history.
i'll give it a try
greetings,
sorry for thee late response but i got busy writing my thesis . i can now confirm that the latest-aio-cpu docker image no longer crashes for me. unfortunately it is impossible to tell if this due to unraid kernel updates/localai updates/ or the aformentioned redownload of the model. i'm pretty sure it was not the model since i observed the crash in multiple models. but anyway thx for the help
greetings,
sorry for thee late response but i got busy writing my thesis . i can now confirm that the latest-aio-cpu docker image no longer crashes for me. unfortunately it is impossible to tell if this due to unraid kernel updates/localai updates/ or the aformentioned redownload of the model. i'm pretty sure it was not the model since i observed the crash in multiple models. but anyway thx for the help
happy to help, and good luck with the thesis!