aphrodite-engine icon indicating copy to clipboard operation
aphrodite-engine copied to clipboard

[Feature]: tensor parallelism support for bnb quantization (via IBM's fork)

Open BlairSadewitz opened this issue 1 year ago • 3 comments

🚀 The feature, motivation and pitch

I don't know if it's feasible or worthwhile to merge this, as maybe the trees are too divergent, etc., but cherry-picking commits for projects I don't fully understand is somehow a pastime for me, so ...

Alternatives

I could always use one of the other 8.4234234*10^23 quantization methods, but, hey, variety is the spice of life--or something.

Additional context

It doesn't work for pre-quantized models. 🎉~

BlairSadewitz avatar Sep 28 '24 16:09 BlairSadewitz

Perhaps, I'll have to look into it. bnb hasn't been a priority

AlpinDale avatar Sep 29 '24 03:09 AlpinDale

Yeah, I hear you. I'm gonna file a better PR in a second, though, so ... ;-)

BlairSadewitz avatar Sep 29 '24 22:09 BlairSadewitz

FYI I'm working on new kernels for massively speeding up bnb quants + add TP support for them. You might want to hold on for now, or help out with that upcoming PR if you're comfortable with CUDA

AlpinDale avatar Sep 30 '24 09:09 AlpinDale