LocalAI icon indicating copy to clipboard operation
LocalAI copied to clipboard

linux kernel crash after a lengthy chat

Open maxvaneck opened this issue 1 year ago • 12 comments

LocalAI version: latest-aio-cpu docker container 2.22.1 at this time

Environment, CPU architecture, OS, and Version: Linux Zana 6.1.106-Unraid #1 SMP PREEMPT_DYNAMIC Wed Aug 21 23:36:07 PDT 2024 x86_64 11th Gen Intel(R) Core(TM) i5-11400 @ 2.60GHz GenuineIntel GNU/Linux 64 GB of RAM

Describe the bug after a while of doing some roleplay chatting with the system prompt set to a character description the linux kernel will simply crash included are the stacktrace pictures i took

To Reproduce simply load up the webui choose a roleplay model enter a character description as a prompt and keep chatting until crash

Expected behavior <really? -_-

Logs i would but my problem prevents this kernel stacktraces are included

Additional context this problem is so far reproducible on stheno and lewdplay(with the occasional anti horni bonk it's a good roleplay model ) i have also reproduced this problem with big-agi as frontend which seems to increase the likelihood of crashes but the default webui also exhibits this behaviour .
WhatsApp Image 2024-10-27 at 19 44 14(4) WhatsApp Image 2024-10-27 at 19 44 14(3) WhatsApp Image 2024-10-27 at 19 44 14(2) WhatsApp Image 2024-10-27 at 19 44 14(1) WhatsApp Image 2024-10-27 at 19 44 14

maxvaneck avatar Oct 28 '24 10:10 maxvaneck

have you watched your temps? sounds like overheating to me.

levidehaan avatar Nov 12 '24 18:11 levidehaan

That was my first thought as well but that would cause a system reset not a full out kernel panic. Besides it's fine with 8 hours of transcoding video so I can reasonably rule out temperature

maxvaneck avatar Nov 12 '24 19:11 maxvaneck

i also suspected memory errors since i'm using the cpu and llama-cpp is very memory intensive but that passed an 12 hour memcheck so thats fine as well

maxvaneck avatar Nov 18 '24 12:11 maxvaneck

im thinking mismatch between input and available context length, i used to have a similar fault when i built my 'big-rag' poc

jtwolfe avatar Dec 25 '24 11:12 jtwolfe

hmm maybe still weird that could cause a kernel crash

maxvaneck avatar Dec 25 '24 14:12 maxvaneck

just a thought, but, have you tried a different browser? maybe the output caused a buffer overrun in the browser, and it accidentally executed it as code.

michieal avatar Jan 03 '25 10:01 michieal

could be but the problem occurs on both the default webui en when i use big-agi as a frontend . i use firefox btw

maxvaneck avatar Jan 04 '25 10:01 maxvaneck

next question: Are you using mmap of the model file(s)

michieal avatar Jan 09 '25 18:01 michieal

i'm not quite sure what you're asking. i'm just using the localai webui

maxvaneck avatar Jan 10 '25 08:01 maxvaneck

i'm not quite sure what you're asking. i'm just using the localai webui

look in the model's .yaml file for "mmap: true". it means that the model is being "streamed" instead of loaded 100% into memory.

michieal avatar Jan 13 '25 03:01 michieal

lewdplay has mmap set to true

maxvaneck avatar Jan 13 '25 16:01 maxvaneck

lewdplay has mmap set to true

okay, then.. I would say to try redownloading the model file. it may have gotten corrupted, and you are hitting it once it hits enough history.

michieal avatar Jan 14 '25 04:01 michieal

i'll give it a try

maxvaneck avatar Jan 16 '25 12:01 maxvaneck

greetings,

sorry for thee late response but i got busy writing my thesis . i can now confirm that the latest-aio-cpu docker image no longer crashes for me. unfortunately it is impossible to tell if this due to unraid kernel updates/localai updates/ or the aformentioned redownload of the model. i'm pretty sure it was not the model since i observed the crash in multiple models. but anyway thx for the help

maxvaneck avatar Feb 01 '25 10:02 maxvaneck

greetings,

sorry for thee late response but i got busy writing my thesis . i can now confirm that the latest-aio-cpu docker image no longer crashes for me. unfortunately it is impossible to tell if this due to unraid kernel updates/localai updates/ or the aformentioned redownload of the model. i'm pretty sure it was not the model since i observed the crash in multiple models. but anyway thx for the help

happy to help, and good luck with the thesis!

michieal avatar Feb 12 '25 03:02 michieal