gnovack issues

Results 5 issues of


                                            gnovack

[feature] add support for Host Policy in Triton Model Configurator

When generating a model configuration using the `triton-config-model` command, it would great to allow users to assign Host Policies to specific instance groups as described in [Model Configuration - Host...

Pytorch MBart Model - Trace on CPU and run inference on GPU.

### System Info - `transformers` version: 4.26.1 - Platform: Linux-5.10.157-139.675.amzn2.x86_64-x86_64-with-glibc2.26 - Python version: 3.9.15 - Huggingface_hub version: 0.13.0 - PyTorch version (GPU?): 1.13.1 (True) - Tensorflow version (GPU?): not installed...

High Peak GPU Memory with ZeRO Stage 3

**Describe the bug** While comparing ZeRO Stage 2 and ZeRO Stage 3, I found that the peak GPU memory utilization (as measured by the `deepspeed.runtime.utils.memory_status` function) is higher when using...

bug

training

[Neuron] Add Neuron device communicator for vLLM v1

This PR is part of the effort to support vLLM v1 architecture on the neuron platform (see #11152). This change adds a device communicator class for the neuron backend using...

add support for --fully-sharded-loras in fused_moe

## Purpose This PR adds support for `--fully-sharded-loras` to MoE LoRA adapters to allow S-LoRA style sharding of adapter weights. This reduces the amount of GPU memory required per rank...