llama_cpp.rb
llama_cpp.rb copied to clipboard
Chat.rb example: Crash when prompt exceeds context?
When the position n_past and the upcoming embedding exceeds the context size there is a bug in the code:
if n_past + embd.size > n_ctx
...
embd.insert(0, last_n_tokens[(n_ctx - (n_left / 2) - embd.size)...-embd.size])
https://github.com/yoshoku/llama_cpp.rb/blob/97224779aff9923f357f0ad141604c1d3fbfff56/examples/chat.rb#L68C21-L68C21
Inserting like this will insert the sub-range at the position 0 as a new element rather than inserting the elements.
I tried using splat as follows:
embd.insert(0, *last_n_tokens[(n_ctx - (n_left / 2) - embd.size)...-embd.size])
but this makes the GGML code crash:
GGML_ASSERT: ./src/ggml.c:4785: view_src == NULL || data_size + view_offs <= ggml_nbytes(view_src)
I put this dirty little hack in my code and it works. I'm sure the issue is somewhere else in the code but I didn't have time to dig in deeper so this patch works for me ;-)
n_eval = [options[:batch_size], embd.size - i].min
embd.flatten! if embd.first.class == Array
context.eval(tokens: embd[i...i + n_eval], n_past: n_past)
n_past += n_eval