Marc Sun

Results 20 issues of Marc Sun

# What does this PR do ? This PR fixes two issues user can have when using big inference model: - Use their own device map but forget that parameters...

# What does this PR do? This PR modifies the quantization doc to include a link to accelerate documentation if the user wants to quantize their own pytorch model.

# What does this PR do ? This PR adds a workflow for quantization tests + related dockerfile. Since we merged the [HfQuantizer PR](https://github.com/huggingface/transformers/pull/26610), the community started integrating their own...

# What does this PR do ? This PR adds the quantization methods from quanto library. We will support inference + model quantization if the user perform weights only quantization...

# What does this PR do ? This PR adds a message for users with RTX 4000 when using multi-gpu inference with device_map. Maybe we can do a better check...

# What does this PR do ? This PR modifies `max_memory` by allocating 80% of cpu memory of "cpu" is not passed in `max_memory`. The idea was in case the...

enhancement
feature request
wip

# What does this PR do ? This PR adds the possibility to perform cpu offload with the weights stored inside the module. You just need to pass `cpu_offload =...

wip

# What does this PR do ? This PR replaces the scales by `ScaleActivation` when we have GELU activation. Before this PR, the scales calculated during the quantization were not...

# What does this PR do ? This PR enable serializing quanto quantized models. Needs the following [PR](https://github.com/huggingface/quanto/pull/120) in quanto. Draft for now

# What does this PR do Fix a couple of issues related to safetensors + loading with module on `meta` device. Draft for now