misska1
misska1
config=cuda is what?
I am looking forward to your support for GPT-J-6B too!😄
Have you fix this issue?
How to fix this?
I had the same issue. ====================START warmup==================== =========lightseq========= lightseq generating... Traceback (most recent call last): File "test/ls_bart.py", line 102, in main() File "test/ls_bart.py", line 83, in main warmup(tokenizer, ls_model, hf_model,...
> You can convert the HF checkpoints back to Megatron-DeepSpeed. See this (a bit hacky) script: https://gist.github.com/malteos/c194368594e16439c101b7bf27195fd1 @malteos Thank you for your answer! However , in your code ,I need...
> the initial topology conversion was written for BF16Optimizer, but here you use zero stage=1, which I haven't worked with, so I have no experience with this use-case. > >...
 I can not build this tokenizer by rust and I use tokenizer=0.12.0 instead. Does this matters? @stas00
> Honestly I'm not sure as I wasn't part of the data team. I remember they said that most likely the normal tokenizer should work, but it might be safer...
> I think the first grad norms that are 0 are linked to overflow, so essentially we drop the batch and reduce the loss factor. So everything should be normal,...