NR Wu

Results 7 issues of NR Wu

### 🐛 Describe the bug I found that before I FSDP a `nn.Module`, the parameters of `nn.Module` are already allocated. If the parameter weights need more HBM than GPU capacity,...

With F16 enabled, `PipelineEngine.eval_batch` will not correctly broadcast loss. In last stage, `eval_batch` returns f16 loss, while in other stages, `eval_batch` will return noise. ```python def _bcast_pipe_scalar(self, data, src_rank=None, dtype=torch.float32):...

### Describe the feature I found only DP and ZeRO strategy supports in `ColossalAI/applications/Chat/examples`, is hybrid parallelism (like PP / Megatron) supported?

enhancement

**Describe the bug** 1. `GPTDatasetConfig` got unexpected keyword `mmap_bin_files` (can be solved if I install main of Megatron LM instead of megatron-core-r0.5) 2. `GPTDatasetConfig` got unexpected keyword `is_build_on_rank` 3. merge...

bug
stale

So users may use vocos in GPU clusters w/o internet available.

### System Info ```Shell Copy-and-paste the text below in your GitHub issue - `Accelerate` version: 1.0.1 - Platform: Linux-5.4.241-1-tlinux4-0017.12-x86_64-with-glibc2.17 - `accelerate` bash location: /home/nr/conda/bin/accelerate - Python version: 3.10.10 - Numpy...

wip

Researchers often need to compare the performance of different models or evaluate the same model under varying hyperparameters and datasets. While Megatron-LM provides a command-line-based approach for configuration, which is...