Jordy Van Landeghem comments

Results 12 comments of


                                            Jordy Van Landeghem

Support Multiple Models

Add support for ReFT

Any traction on this?

HFQuantizer implementation for compressed-tensors library

Eagerly awaiting this! Great work @neuralmagic team ;)

[BUG] Training time regression with ZeRO-3 after upgrade to torch 2.3.1 and CUDA 12.1

Integrate late chunking for (potentially) missing context problem.

Would this mean that you would do "patching" on the embedding space, rather than pixels? Is there currently some hyperparameter that restricts the chunking to a single page?

[Feature][kernel] tensor parallelism with bitsandbytes quantization

I tested this PR with a trained QLoRA adapter and I am getting this error: `KeyError: 'lm_head.qweight'` Might this be due to only checking for certain adapter weights? EDIT: no...

[Feature][kernel] tensor parallelism with bitsandbytes quantization

@junzhang-zj lol I have exactly the same use case ;p

[Feature][kernel] tensor parallelism with bitsandbytes quantization

> > I tested this PR with a trained QLoRA adapter and I am getting this error: `KeyError: 'lm_head.qweight'` > > Might this be due to only checking for certain...

[Bug]: LLaMa 3.1 8B/70B/405B all behave poorly and differently using completions API as compared to good chat API

@arianyambao We also suspect issues with Llama-3.1 in vllm, as the scores are far from better than Llama-3. After finetuning it performs even worse...

[Bug]: LLaMa 3.1 8B/70B/405B all behave poorly and differently using completions API as compared to good chat API

Can this be given some higher priority? It is an absolute blocker to this set of models...