Liqiang NIU

Results 12 comments of Liqiang NIU

Same. Looking forward to the open-source training code and details.

Does this have an effect on traning?

@lucidrains @Lijun-Yu

@OmkarThawakar Thanks, it's worked! But i got another error like this " attn_output = torch.nn.functional.scaled_dot_product_attention( RuntimeError: The size of tensor a (10) must match the size of tensor b (19)...

@EasonXiao-888 When token_label is setting to False, the loss is always nan. Later i downloaded the token label datasets, but the training is still unstable (at the middle step of...

same issue, triton version==2.1.0, torch=2.0.1, cuda11.6 # File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/mamba_ssm/modules/mamba2.py", line 176, in forward out = mamba_split_conv1d_scan_combined( File "/opt/conda/lib/python3.10/site-packages/mamba_ssm/ops/triton/ssd_combined.py", line 908, in...

torch2.0.1, cuda11.6, triton2.3.0 Triton Error [CUDA]: device kernel image is invalid

llm_foundry version is 0.10.0