Daniel Bammer
Results
1
comments of
Daniel Bammer
I can confirm this behavior on gemma-3-4b-it for every attention implementation: sdpa, eager and flash_attention_2 Llama.cpp only recently patched gemma-3 attention to need less vram.