Peter Geng
Peter Geng
这个场景用can好像不太礼貌,🐶
while another open source animator released, this one will copy then open source.
I think this is not a problem to this project codes, thanks to this project, but I don't know where the problem is.
below is some params of Lora & Trainer ``` MICRO_BATCH_SIZE = 4 BATCH_SIZE = 128 GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE EPOCHS = 7 # paper uses 3 LEARNING_RATE = 2e-5...
> > below is some params of Lora & Trainer > > ``` > > MICRO_BATCH_SIZE = 4 > > BATCH_SIZE = 128 > > GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE...
> @RG-sw default params, but epoch to 10 and learning rate to 1e-3, worked, but I don't think it's a good solution, as above said.
同样的疑问
@ydli-ai 老哥, 这个不同的大模型预训练,对gpu需求是线性的吗? 比方说您7B的模型,32xA100用两天, 那么如果用65B的模型(按10倍算),用32xA100需要20天吗?或者用320xA100需要2天吗? 谢谢
I got same concern about chapter 8, how `data.ts` changed to dynamic renderring? thanks