llama.cpp

llama.cpp copied to clipboard

Published 2 months ago •

Reame
Issues

Llama only uses dedicated memory when both shared and dedicated are available.

Open jadbox opened this issue 1 year ago • 1 comments

Hey folks, the Razer Blade18 4090 has both dedicated and shared memory, but it seems most applications like Llama only are designed to access the dedicated. Is there a way to utilize both for performance video memory pipelining?

Screenshot 2024-04-18 095137 Screenshot 2024-04-18 095430

Linked reddit thread on the 4090 with both dedicated and shared: https://www.reddit.com/r/nvidia/comments/1c73zaf/my_4090_blade18_has_both_dedicated_and_shared/

Apr 18 '24 14:04 jadbox

I have the same question. Why does it never use the GPU-shared memory? gpu memory

May 13 '24 13:05 saostad

same

May 17 '24 21:05 HamzaYslmn

?

Jun 15 '24 05:06 RazyRo

because shared gpu memory is not the real memory that is connected to gpu chip

Jun 15 '24 16:06 ctrysbita

because shared gpu memory is not the real memory that is connected to gpu chip

that is correct but using GPU-shared memory will increase the performance since GPU doesn't have to ask the CPU to access the data in the memory.

Jun 18 '24 16:06 alirezanet

same question here

Jul 12 '24 01:07 ihor-sokoliuk

This issue was closed because it has been inactive for 14 days since being marked as stale.

Aug 26 '24 01:08 github-actions[bot]