NR Wu issues

Results 7 issues of


                                            NR Wu

`nn.Module` parameters allocated before warped by FSDP

### 🐛 Describe the bug I found that before I FSDP a `nn.Module`, the parameters of `nn.Module` are already allocated. If the parameter weights need more HBM than GPU capacity,...

Fix `PipelineEngine.eval_batch` result

With F16 enabled, `PipelineEngine.eval_batch` will not correctly broadcast loss. In last stage, `eval_batch` returns f16 loss, while in other stages, `eval_batch` will return noise. ```python def _bcast_pipe_scalar(self, data, src_rank=None, dtype=torch.float32):...

[FEATURE]: Is hybrid parallelism supported in GPT demo?

### Describe the feature I found only DP and ZeRO strategy supports in `ColossalAI/applications/Chat/examples`, is hybrid parallelism (like PP / Megatron) supported?

enhancement

Tutorial BUG

**Describe the bug** 1. `GPTDatasetConfig` got unexpected keyword `mmap_bin_files` (can be solved if I install main of Megatron LM instead of megatron-core-r0.5) 2. `GPTDatasetConfig` got unexpected keyword `is_build_on_rank` 3. merge...

bug

stale

Feat: Offline mode

So users may use vocos in GPU clusters w/o internet available.

[BUG] Accelerate 1.0.1 failed to train multiple zero-3 models

### System Info ```Shell Copy-and-paste the text below in your GitHub issue - `Accelerate` version: 1.0.1 - Platform: Linux-5.4.241-1-tlinux4-0017.12-x86_64-with-glibc2.17 - `accelerate` bash location: /home/nr/conda/bin/accelerate - Python version: 3.10.10 - Numpy...

wip

use multiple yaml files to avoid passing annoying model configs from cmd lines

Researchers often need to compare the performance of different models or evaluate the same model under varying hyperparameters and datasets. While Megatron-LM provides a command-line-based approach for configuration, which is...