Alan Fang

Results 49 comments of Alan Fang

fsdp has a bug: sharding_strategy needs “ShardingStrategy” not “str”

> > fsdp 有一个错误:sharding_strategy 需要“ShardingStrategy”而不是“str” > > 你解决了吗? @fclearner yes,try this:https://github.com/wenet-e2e/wenet/blob/24375c51c8ccbf1dac0724f5734a6eae4ac9c428/wenet/utils/train_utils.py#L410

加init.py就能解决吗,我这边没遇到这个问题

了解了,后面补一个

> 那请问这个问题怎么解决呀? 你还有碰到类似的问题吗,最新的代码我已经加了__init__.py了

Is this problem solved? I have just met the same one, maybe its related to the ckpt-save part

try this: model_engine.backward(loss) if (step + 1) % model_engine.gradient_accumulation_steps() == 0: model_engine.step() model_engine.zero_grad()

> try this: model_engine.backward(loss) > > if (step + 1) % model_engine.gradient_accumulation_steps() == 0: model_engine.step() model_engine.zero_grad() sorry, remove gradient_accumulation_steps is enough

大佬,请教个问题,72B-bf16的理论显存不该是72*2字节,也就是144GB吗,为啥你这里只用了70GB,是上了int4?还是有我不了解的处理机制在里面。