xyLu comments

Repositories
Issues
Comments

Results 4 comments of


                                            xyLu

[BUG] Int8 Inference Does Not Work For GPTJ

The same two question when use deepspeed inference: (1) It seems that `replace_with_kernel_inject=True` conflict with `dtype=torch.int8` and causes "CUDA error: an illegal memory access was encountered". (2) With setting `replace_with_kernel_inject=False`,...

Very Slow Inference on PEFT-LORA fine-tunned FLAN-UL2

This is a very interesting issue and I already know that I should do merge_and_unload() before generation with LoRA tuned casual language model such as LLaMA. And my new question...

Very Slow Inference on PEFT-LORA fine-tunned FLAN-UL2

> for PPL, you just need forward, I don't think you need to call the `generate` function Thank you. May I explain it like this: In evaluation, the forward() function...

Example GRPO script (run_qwen2-7b.sh) gets stuck when setting trainer.n_gpus_per_node > 1

same question