llm-analysis
llm-analysis copied to clipboard
[REQUEST] Implement modern attention schemes such as GQA or MLA
Thanks for this very interesting library,
I did not see any specific implementations for Grouped Query Attention or Multi Head Latent attention, which seem to be very popular these days.