andyG issues

Repositories
Issues
Comments

Results 3 issues of


                                            andyG

[BUG] Cannot free parameter with ZeRO3 + offload parameter in Pytorch1.9

**Describe the bug** When tranining llama 13B(https://github.com/ymcui/Chinese-LLaMA-Alpaca), I observed it cannot free parameter memory using ZeRO3 + Offload strategy parameter in pytorch1.9, but parameter memory can be freed in pytorch1.13...

bug

training

Fix(memory optimization): inplace subtract vocab_parallel_logits

vocab_parallel_logits's shape is [seq_len, batch_size, vocab_size / tp]. if vocab_size is very large like Llama3, use inplace subtract to reduce memory usage.

stale

[MLU] update deepspeed-mlu dependency

# What does this PR do? update deepspeed-mlu dependency related PRs: 1. (https://github.com/huggingface/transformers/pull/34362) 2. (https://github.com/microsoft/DeepSpeed/pull/6472) ## Before submitting - [ ] This PR fixes a typo or improves the docs...