llama_cpp.rb icon indicating copy to clipboard operation
llama_cpp.rb copied to clipboard

Chat.rb example: Crash when prompt exceeds context?

Open coezbek opened this issue 2 years ago • 1 comments

When the position n_past and the upcoming embedding exceeds the context size there is a bug in the code:

if n_past + embd.size > n_ctx
  ...
  embd.insert(0, last_n_tokens[(n_ctx - (n_left / 2) - embd.size)...-embd.size])

https://github.com/yoshoku/llama_cpp.rb/blob/97224779aff9923f357f0ad141604c1d3fbfff56/examples/chat.rb#L68C21-L68C21

Inserting like this will insert the sub-range at the position 0 as a new element rather than inserting the elements.

I tried using splat as follows:

  embd.insert(0, *last_n_tokens[(n_ctx - (n_left / 2) - embd.size)...-embd.size])

but this makes the GGML code crash:

GGML_ASSERT: ./src/ggml.c:4785: view_src == NULL || data_size + view_offs <= ggml_nbytes(view_src)

coezbek avatar Sep 19 '23 14:09 coezbek

I put this dirty little hack in my code and it works. I'm sure the issue is somewhere else in the code but I didn't have time to dig in deeper so this patch works for me ;-)

n_eval = [options[:batch_size], embd.size - i].min
embd.flatten! if embd.first.class == Array
context.eval(tokens: embd[i...i + n_eval], n_past: n_past)
n_past += n_eval

23atomist avatar Oct 21 '23 18:10 23atomist