邱梓钿 comments

Results 3 comments of


                                            邱梓钿

If I want to use triton, attn_type can be set to triton_linear, but what should ffn_type be set to? loading pretrained weights be affected?

no, according to the wandb log, it seems that need 48G to train

each GPU's memory should not be less than 40G