hackey

Results 13 comments of hackey

Dear developers. I support the question. Tell me, is it possible to use xformers with composable_kernel (or without composable_kernel) with a 7900 xtx video card?

I also have an error when compiling for 7900xtx (gfx1100). Write, does flash-attention support this card?

Hi! When switching to a new engine, I am very interested in how things will be with AMD ROCM support and in particular Navi 3 (rdna 3). I have been...

> > С введением AITER и других инструментов, которые полагаются на CK, системы NAVI 3 становятся практически непригодными для использования. В то время как вы можете использовать практически любое низкоуровневое...

Mistral developers have made changes to the context length. So this request can be closed. Llama.cpp supports this model.

> > Be me, see gemma come out. _People say it's coal._ > > Screw it, I'll try it. > > wot backend? no exllama, llama.cpp has no pictchas, hey...

Guys, maybe you should stop writing empty comments? It would be better to like the first message, as is customary in gitHub communities. After all, for every comment with "+1"...

Thanks for the help! I saw the error about flash attention, but I was actually surprised. After all, it seems like fa for rocm is now implemented using triton in...

> You're right, it's not needed. It's just being imported for this check, since the Mamba mixer looks inside the attention metadata. I'll put up a fix in the next...

I'm using 2 AMD Radeon 7900 XTX video cards and I get this error when starting any model: ``` vllm serve /app/model/Qwen2.5-Coder-14B-Instruct --port 8002 --tensor-parallel-size 2 --gpu-memory-utilization 0.9 --max-model-len 64000...