llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Llama only uses dedicated memory when both shared and dedicated are available.

Open jadbox opened this issue 1 year ago • 1 comments

Hey folks, the Razer Blade18 4090 has both dedicated and shared memory, but it seems most applications like Llama only are designed to access the dedicated. Is there a way to utilize both for performance video memory pipelining?

Screenshot 2024-04-18 095137 Screenshot 2024-04-18 095430

Linked reddit thread on the 4090 with both dedicated and shared: https://www.reddit.com/r/nvidia/comments/1c73zaf/my_4090_blade18_has_both_dedicated_and_shared/

jadbox avatar Apr 18 '24 14:04 jadbox

I have the same question. Why does it never use the GPU-shared memory? gpu memory

saostad avatar May 13 '24 13:05 saostad

same

HamzaYslmn avatar May 17 '24 21:05 HamzaYslmn

?

RazyRo avatar Jun 15 '24 05:06 RazyRo

because shared gpu memory is not the real memory that is connected to gpu chip

ctrysbita avatar Jun 15 '24 16:06 ctrysbita

because shared gpu memory is not the real memory that is connected to gpu chip

that is correct but using GPU-shared memory will increase the performance since GPU doesn't have to ask the CPU to access the data in the memory.

alirezanet avatar Jun 18 '24 16:06 alirezanet

same question here

ihor-sokoliuk avatar Jul 12 '24 01:07 ihor-sokoliuk

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Aug 26 '24 01:08 github-actions[bot]