mabingqi issues

Repositories
Issues
Comments

Results 3 issues of


                                            mabingqi

implementation of stoch depth in nfnets

The implementation of stoch depth in the code of nfnet seems to be batch-wise dropout, but not block-level dropout as described in paper.

bp of shared parameters and experts

The ddp in pytorch can not distinguish experts and other shared parameters. And experts may be updated with shared gradient. The TutelDistributedOptimizer seems to be an implementation of zero, which...

question

error when convert llama1 ckpts to hf formath

### System Info - `transformers` version: 4.41.0.dev0 - Platform: Linux-4.18.0-425.3.1.el8.x86_64-x86_64-with-glibc2.17 - Python version: 3.9.12 - Huggingface_hub version: 0.21.4 - Safetensors version: 0.4.2 - Accelerate version: 0.21.0 - Accelerate config: not...