LugerW-A issues

Results 5 issues of


                                            LugerW-A

[Feature]: How can I set a different max_pixels for each request when starting the service?

### 🚀 The feature, motivation and pitch For Qwen2VL/Qwen2.5VL We need a service that can dynamically handle pixels. ### Alternatives _No response_ ### Additional context _No response_ ### Before submitting...

feature request

[Bug]: wake up OOM (72B model in 8*A800(40G))

### Your current environment PyTorch version: 2.5.1+cu124 Is debug build: False CUDA used to build PyTorch: 12.4 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC version:...

bug

[Feature]: Quant & TP for VIT

### 🚀 The feature, motivation and pitch Regarding InternViT with 6 billion parameters, what is the process for FP8 quantization? Additionally, what are the steps to implement Tensor Parallelism for...

feature request

AQLM for VLM

can we use AQLM for multi modal LLM?

QQQ OOM

https://github.com/HandH1998/QQQ/blob/e307d9f00b90309069733890f28eefe9886bb6f2/QQQ/gptq/models/llama.py#L56C5-L58C36 To avoid an Out-of-Memory (OOM) error when quantizing with default commands, which activate all data on a single GPU, can we optimize this process in several ways？