邱梓钿
Results
3
comments of
邱梓钿
If I want to use triton, attn_type can be set to triton_linear, but what should ffn_type be set to? loading pretrained weights be affected?
no, according to the wandb log, it seems that need 48G to train
each GPU's memory should not be less than 40G