Daniel Bammer

Results 1 comments of Daniel Bammer

I can confirm this behavior on gemma-3-4b-it for every attention implementation: sdpa, eager and flash_attention_2 Llama.cpp only recently patched gemma-3 attention to need less vram.