llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

No need to copy tokens

Open howard0su opened this issue 2 years ago • 3 comments

howard0su avatar Apr 17 '23 15:04 howard0su

Ironically with N=1, doing this will move more data than the memcpy (a 64-bit pointer instead of just a 32-bit value).

slaren avatar Apr 17 '23 15:04 slaren

If we can expose api to create tensor with data, then this will save additional cycles to allocate data of tensor.

howard0su avatar Apr 17 '23 15:04 howard0su

If we can expose api to create tensor with data, then this will save additional cycles to allocate data of tensor.

Ironically with N=1, doing this will move more data than the memcpy (a 64-bit pointer instead of just a 32-bit value).

Overhead is more than the data but also memcpy? I think the intention is not perf, but we should not do the copy here.

howard0su avatar Apr 17 '23 15:04 howard0su

Redirecting ggml_tensor.data is not a good thing to do. mmap changes set a precedent, but we should avoid it in general

ggerganov avatar Apr 22 '23 08:04 ggerganov