James Whedbee

Results 16 comments of James Whedbee

I am eagerly awaiting this too. Is there any area where contributions would be welcomed to help merge this?

I am seeing this too using `CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers`

I was going to try this out soon, is this in a good spot or is it still being worked?

Yep tensor parallelism worked for me with no code changes! I'll try using Llama 2 70b unquantized tomorrow as the verifier model. Because int4 quantization is not supported for AMD...

@Chillee Maybe I misunderstood, could you give me an example command you think should result in a speed-up? I can get ~15 tokens/second for an unquantized LLama 70B using compile...

@Chillee that unfortunately also just results in ~8 tokens/second EDIT: just saw your edit

I am at the third bullet point here as well, going to just follow along to comments here

That looked promising but I unfortunately ran into another issue you probably wouldn't have. I am on AMD so that might be the cause. I can't find anything online related...