gnovack
gnovack
When generating a model configuration using the `triton-config-model` command, it would great to allow users to assign Host Policies to specific instance groups as described in [Model Configuration - Host...
### System Info - `transformers` version: 4.26.1 - Platform: Linux-5.10.157-139.675.amzn2.x86_64-x86_64-with-glibc2.26 - Python version: 3.9.15 - Huggingface_hub version: 0.13.0 - PyTorch version (GPU?): 1.13.1 (True) - Tensorflow version (GPU?): not installed...
**Describe the bug** While comparing ZeRO Stage 2 and ZeRO Stage 3, I found that the peak GPU memory utilization (as measured by the `deepspeed.runtime.utils.memory_status` function) is higher when using...
This PR is part of the effort to support vLLM v1 architecture on the neuron platform (see #11152). This change adds a device communicator class for the neuron backend using...
## Purpose This PR adds support for `--fully-sharded-loras` to MoE LoRA adapters to allow S-LoRA style sharding of adapter weights. This reduces the amount of GPU memory required per rank...