Liqiang NIU
Liqiang NIU
Same. Looking forward to the open-source training code and details.
Does this have an effect on traning?
@lucidrains @Lijun-Yu
Same question!
in attend.py line #123 sim = einsum("b h i d, b h j d -> b h i j", q, k) * self.scale
@OmkarThawakar Thanks, it's worked! But i got another error like this " attn_output = torch.nn.functional.scaled_dot_product_attention( RuntimeError: The size of tensor a (10) must match the size of tensor b (19)...
@EasonXiao-888 When token_label is setting to False, the loss is always nan. Later i downloaded the token label datasets, but the training is still unstable (at the middle step of...
same issue, triton version==2.1.0, torch=2.0.1, cuda11.6 # File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/mamba_ssm/modules/mamba2.py", line 176, in forward out = mamba_split_conv1d_scan_combined( File "/opt/conda/lib/python3.10/site-packages/mamba_ssm/ops/triton/ssd_combined.py", line 908, in...
torch2.0.1, cuda11.6, triton2.3.0 Triton Error [CUDA]: device kernel image is invalid
llm_foundry version is 0.10.0