Peng issues

Results 12 issues of


                                            Peng

[Feature] MoE模型里稠密层和专家层zero和并行的解耦

### Describe the feature MoE模型里稠密层和专家层zero和并行的解耦 ### Will you implement it? - [ ] I would like to implement this feature and create a PR!

enhancement

[Feature] 不使用memory pool

### Describe the feature 实际使用过程中，不需要memory_pool，memory pool的逻辑可能和其他芯片的显存分配策略有冲突，建议统一去除memory pool的实现和使用，包括moe对memory pool的使用 ### Will you implement it? - [ ] I would like to implement this feature and create a PR!

enhancement

[QA] check import system var at the start of training

### Describe the question.

question

[Feature] a very simple hugging-face dataloader

### Describe the feature a very simple on-the-fly dataloader is needed to support most pubic dataset ### Will you implement it? - [X] I would like to implement this feature...

enhancement

[Feature] Should we remove other dependency of flashattention?

### Describe the feature Should we remove other dependency of flash-attention, and only keep the core attention related ops? If possible, we can only use pip to install flash-attention, avoiding...

enhancement

[Feature] CI should have a true no flashattention env

### Describe the feature CI should have a true no flashattention env ### Will you implement it? - [X] I would like to implement this feature and create a PR!

bug

enhancement

[Bug] the usage "x.is_cuda" is not recommended, since we have both GPU and NPU

### Describe the bug ### Environment Torch2.1 ### Other information _No response_

bug

[Bug] do not use torch.cuda.current_device() as device, since it only retures an int

### Describe the bug we have a lot of cases like following: ` data = torch.empty(partition_size, dtype=tensor.dtype, device=torch.cuda.current_device(), requires_grad=False) ` where we directly use device=torch.cuda.current_device(). However, it is not recommended...

bug

[Feature] update readme with new version of dependency.

### Describe the feature update readme with new version of dependency. ### Will you implement it? - [ ] I would like to implement this feature and create a PR!

enhancement

[Feature] supporting hugging-face modeling python file

### Describe the feature supporting hugging-face modeling python file ### Will you implement it? - [ ] I would like to implement this feature and create a PR!

enhancement