Peter Geng

Results 9 comments of Peter Geng

这个场景用can好像不太礼貌,🐶

while another open source animator released, this one will copy then open source.

I think this is not a problem to this project codes, thanks to this project, but I don't know where the problem is.

below is some params of Lora & Trainer ``` MICRO_BATCH_SIZE = 4 BATCH_SIZE = 128 GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE EPOCHS = 7 # paper uses 3 LEARNING_RATE = 2e-5...

> > below is some params of Lora & Trainer > > ``` > > MICRO_BATCH_SIZE = 4 > > BATCH_SIZE = 128 > > GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE...

> @RG-sw default params, but epoch to 10 and learning rate to 1e-3, worked, but I don't think it's a good solution, as above said.

@ydli-ai 老哥, 这个不同的大模型预训练,对gpu需求是线性的吗? 比方说您7B的模型,32xA100用两天, 那么如果用65B的模型(按10倍算),用32xA100需要20天吗?或者用320xA100需要2天吗? 谢谢

I got same concern about chapter 8, how `data.ts` changed to dynamic renderring? thanks