Marc Sun issues

Results 20 issues of


                                            Marc Sun

Check tied parameters

# What does this PR do ? This PR fixes two issues user can have when using big inference model: - Use their own device map but forget that parameters...

add link to accelerate doc

# What does this PR do? This PR modifies the quantization doc to include a link to accelerate documentation if the user wants to quantize their own pytorch model.

# What does this PR do ? This PR adds a workflow for quantization tests + related dockerfile. Since we merged the [HfQuantizer PR](https://github.com/huggingface/transformers/pull/26610), the community started integrating their own...

[Quantization] Quanto quantizer

# What does this PR do ? This PR adds the quantization methods from quanto library. We will support inference + model quantization if the user perform weights only quantization...

Add log message for RTX 4000 series when performing multi-gpu inference with device_map

# What does this PR do ? This PR adds a message for users with RTX 4000 when using multi-gpu inference with device_map. Maybe we can do a better check...

allocate 80% for cpu is unset in `max_memory`

# What does this PR do ? This PR modifies `max_memory` by allocating 80% of cpu memory of "cpu" is not passed in `max_memory`. The idea was in case the...

enhancement

feature request

wip

Marc Sun

Check tied parameters

add link to accelerate doc

[CI] Quantization workflow

[Quantization] Quanto quantizer

Add log message for RTX 4000 series when performing multi-gpu inference with device_map

allocate 80% for cpu is unset in `max_memory`

Enable cpu offload with weights inside the module

[awq] replace scale when we have GELU

Quanto serialization

Fix serialization