NR Wu
NR Wu
### 🐛 Describe the bug I found that before I FSDP a `nn.Module`, the parameters of `nn.Module` are already allocated. If the parameter weights need more HBM than GPU capacity,...
With F16 enabled, `PipelineEngine.eval_batch` will not correctly broadcast loss. In last stage, `eval_batch` returns f16 loss, while in other stages, `eval_batch` will return noise. ```python def _bcast_pipe_scalar(self, data, src_rank=None, dtype=torch.float32):...
### Describe the feature I found only DP and ZeRO strategy supports in `ColossalAI/applications/Chat/examples`, is hybrid parallelism (like PP / Megatron) supported?
**Describe the bug** 1. `GPTDatasetConfig` got unexpected keyword `mmap_bin_files` (can be solved if I install main of Megatron LM instead of megatron-core-r0.5) 2. `GPTDatasetConfig` got unexpected keyword `is_build_on_rank` 3. merge...
So users may use vocos in GPU clusters w/o internet available.
### System Info ```Shell Copy-and-paste the text below in your GitHub issue - `Accelerate` version: 1.0.1 - Platform: Linux-5.4.241-1-tlinux4-0017.12-x86_64-with-glibc2.17 - `accelerate` bash location: /home/nr/conda/bin/accelerate - Python version: 3.10.10 - Numpy...
Researchers often need to compare the performance of different models or evaluate the same model under varying hyperparameters and datasets. While Megatron-LM provides a command-line-based approach for configuration, which is...