NR Wu

Results 15 comments of NR Wu

Before llama impl is merged in mega-ds, we implemented another llama in our private repo. And we found that U can at most train 13B llama w/o offloading with 8...

We have been working on LM recently, and encountered this problem. I am trying to fix it. @ShadenSmith @duli2012

In DS, large models will be allocated in zero.Init context, is there anything similar in torch FSDP? ```python with deepspeed.zero.Init(): model = MyLargeModel() ```

> It is not necessary to move the model to GPU before passing to FSDP: > > ``` > model = Net().to(rank) > ``` > > You only need to...

> FSDP has some support for deferred initialization if you look at the `param_init_fn` constructor argument, which would allow exceeding the capacity of CPU DRAM. However, the current support is...

`stCoRoutineAttr_t` 有构造函数的 ```c++ stCoRoutineAttr_t() { stack_size = 128 * 1024; share_stack = NULL; } ```

same problem here.

> Any updates on this? Same question.

> Does it currently require internet connection? Why? It does. https://github.com/gemelo-ai/vocos/blob/main/vocos/pretrained.py#L67-L68 https://github.com/gemelo-ai/vocos/blob/main/vocos/feature_extractors.py#L68

@hubertsiuzdak Hello, may you spare sometime and review / merge / reject this PR?