LugerW-A
LugerW-A
### 🚀 The feature, motivation and pitch For Qwen2VL/Qwen2.5VL We need a service that can dynamically handle pixels. ### Alternatives _No response_ ### Additional context _No response_ ### Before submitting...
### Your current environment PyTorch version: 2.5.1+cu124 Is debug build: False CUDA used to build PyTorch: 12.4 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC version:...
### 🚀 The feature, motivation and pitch Regarding InternViT with 6 billion parameters, what is the process for FP8 quantization? Additionally, what are the steps to implement Tensor Parallelism for...
can we use AQLM for multi modal LLM?
QQQ OOM
https://github.com/HandH1998/QQQ/blob/e307d9f00b90309069733890f28eefe9886bb6f2/QQQ/gptq/models/llama.py#L56C5-L58C36 To avoid an Out-of-Memory (OOM) error when quantizing with default commands, which activate all data on a single GPU, can we optimize this process in several ways?