LongNet Train Error

(venv) personalinfo@MacBook-Pro-3 LongNet % python3 train.py 2024-03-05 23:56:10,524 - numexpr.utils - INFO - NumExpr defaulting to 8 threads. 2024-03-05 23:56:17.908409: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. /Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( /Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( Using StableAdamWUnfused-v1 training: 0%| | 0/100000 [00:01<?, ?it/s]

Traceback (most recent call last):
  File "/Users/personalinfo/LongNet/train.py", line 84, in <module>
    loss = model(next(train_loader))
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 356, in forward
    logits = self.net(x_inp, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 302, in forward
    x = self.transformer(x)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 271, in forward
    x = block(x) + x
RuntimeError: The size of tensor a (4128) must match the size of tensor b (8196) at non-singleton dimension 1
(venv) personalinfo@MacBook-Pro-3 LongNet % python3 train.py
2024-03-06 00:09:22,364 - numexpr.utils - INFO - NumExpr defaulting to 8 threads.
2024-03-06 00:09:27.673362: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
Using StableAdamWUnfused-v1
training:   0%|                                                                                                                                | 0/100000 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/Users/personalinfo/LongNet/train.py", line 84, in <module>
    loss = model(next(train_loader))
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 356, in forward
    logits = self.net(x_inp, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 302, in forward
    x = self.transformer(x)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 271, in forward
    x = block(x) + x
RuntimeError: The size of tensor a (4128) must match the size of tensor b (8196) at non-singleton dimension 1

After setting up the environment, I ran 'python3 train.py' and this happened. Can you have a check? Thank you!

Mar 05 '24 16:03 bruicecode

I ran this script in PyCharm's venv environment.

Mar 05 '24 16:03 bruicecode

Can you help me solve this question? I think the attention part has some problem since when I run 'pytest tests/attention.py' , error occurs like this: @kyegomez

(myenv2) mg@ubuntu:~/LongNet$ pytest tests/attention.py
======================================================= test session starts ========================================================
platform linux -- Python 3.9.18, pytest-7.4.2, pluggy-1.4.0
rootdir: /home/mg/LongNet
plugins: anyio-4.3.0, time-machine-2.14.0, hydra-core-1.0.7
collected 12 items                                                                                                                 

tests/attention.py .FFFF.FF...F                                                                                              [100%]

============================================================= FAILURES =============================================================
_________________________________________ TestDilatedAttention.test_attention_distribution _________________________________________

self = <attention.TestDilatedAttention testMethod=test_attention_distribution>

    def test_attention_distribution(self):
        input_tensor = torch.randn(2, 128, 512)
        dilated_attention = DilatedAttention(512, 8, 2, 64)
        _, attn_weights = dilated_attention(input_tensor)
    
>       self.assertTrue(
            torch.allclose(attn_weights.sum(dim=-1), torch.tensor(1.0))
        )
E       AssertionError: False is not true

tests/attention.py:114: AssertionError
___________________________________________ TestDilatedAttention.test_attention_outputs ____________________________________________

self = <attention.TestDilatedAttention testMethod=test_attention_outputs>

    def test_attention_outputs(self):
>       output = self.sparse_dilated_attention(self.x)
E       AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'

tests/attention.py:151: AttributeError
________________________________________________ TestDilatedAttention.test_dropout _________________________________________________

self = <attention.TestDilatedAttention testMethod=test_dropout>

    def test_dropout(self):
>       self.sparse_dilated_attention.dropout.p = 1.0
E       AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'

tests/attention.py:156: AttributeError
______________________________________________ TestDilatedAttention.test_forward_pass ______________________________________________

self = <attention.TestDilatedAttention testMethod=test_forward_pass>

    def test_forward_pass(self):
>       output = self.sparse_dilated_attention(self.x)
E       AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'

tests/attention.py:145: AttributeError
______________________________________________ TestDilatedAttention.test_output_shape ______________________________________________

self = <attention.TestDilatedAttention testMethod=test_output_shape>

    def test_output_shape(self):
        # Setup
        input_tensor = torch.randn(2, 128, 512)
        dilated_attention = DilatedAttention(512, 8, 2, 64)
    
        # Action
        output = dilated_attention(input_tensor)
    
        # Assert
>       self.assertEqual(output.shape, (2, 128, 512))
E       AssertionError: torch.Size([2, 64, 512]) != (2, 128, 512)

tests/attention.py:18: AssertionError
_________________________________________ TestDilatedAttention.test_relative_position_bias _________________________________________

self = <attention.TestDilatedAttention testMethod=test_relative_position_bias>

    def test_relative_position_bias(self):
        # Setup
        input_tensor = torch.randn(2, 128, 512)
        dilated_attention = DilatedAttention(
            512, 8, 2, 64, use_rel_pos_bias=True
        )
    
        # Action
>       output = dilated_attention(input_tensor)

tests/attention.py:39: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1511: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1520: in _call_impl
    return forward_call(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = DilatedAttention(
  (dropout): Dropout(p=0.0, inplace=False)
  (attention): FlashAttention(
    (attn_dropout): Dropou...Linear(in_features=512, out_features=512, bias=True)
  (proj_v): Linear(in_features=512, out_features=512, bias=True)
)
x = tensor([[[[ 1.4205e+00, -4.5398e-01,  9.8770e-01,  ..., -2.6991e-02,
           -1.1310e+00,  1.4456e-03],
          [...231e-02],
          [ 3.7678e-01, -1.1879e-01,  2.9864e-01,  ..., -5.8582e-02,
           -3.6311e-01,  7.1331e-01]]]])

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """Forward pass of the DilatedAttention module.
    
        Args:
            x (torch.Tensor): The input tensor.
    
        Returns:
            torch.Tensor: The output tensor.
        """
        batch_size, seq_len, _ = x.shape
        padding_len = -seq_len % self.segment_size
        x = F.pad(x, (0, 0, 0, padding_len))
        seq_len = seq_len + padding_len
    
        if self.use_xpos:
            x = self.xpos(x)
    
        # Split and sparsify
        x = x.view(batch_size, -1, self.segment_size, self.dim)
        x = x[:, :, :: self.dilation_rate, :]
    
        # qk_norm
        if self.qk_norm:
            q, k, v = map(
                self.norm, (self.proj_q(x), self.proj_k(x), self.proj_v(x))
            )
        else:
            q, k, v = self.proj_q(x), self.proj_k(x), self.proj_v(x)
    
        # Perform attention
        attn_output = self.attention(q, k, v)
    
        # if use rel pos => apply relative positioning bias
        if self.use_rel_pos_bias:
>           attn_output += self.relative_bias(
                batch_size, attn_output.size(1), attn_output.size(1)
            )
E           RuntimeError: The size of tensor a (512) must match the size of tensor b (2) at non-singleton dimension 3

../anaconda3/envs/myenv2/lib/python3.9/site-packages/long_net/attention.py:123: RuntimeError
__________________________________________________ TestDilatedAttention.test_xpos __________________________________________________

self = <attention.TestDilatedAttention testMethod=test_xpos>

    def test_xpos(self):
        # Setup
        input_tensor = torch.randn(2, 128, 512)
        dilated_attention = DilatedAttention(512, 8, 2, 64, use_xpos=True)
    
        # Action
>       output = dilated_attention(input_tensor)

tests/attention.py:26: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1511: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1520: in _call_impl
    return forward_call(*args, **kwargs)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/long_net/attention.py:104: in forward
    x = self.xpos(x)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1511: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1520: in _call_impl
    return forward_call(*args, **kwargs)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/long_net/utils.py:254: in forward
    x = apply_rotary_pos_emb(x, sin, cos, scale)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

x = tensor([[[-0.1896,  0.1342, -0.3958,  ..., -0.4303, -0.2086,  0.4187],
         [-0.2131,  0.3323,  0.4395,  ..., -1.5...59,  0.2221,  ...,  1.7753, -1.4079,  1.2502],
         [ 1.5155, -1.4299,  0.4873,  ..., -1.0910, -0.7816, -0.7960]]])
sin = tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  0.0000e+00,
          0.0000e+00,  0.0000e+00],
        [ 9.817...,  1.6756e-02],
        [ 8.3368e-01,  8.3368e-01,  7.2268e-01,  ...,  2.2456e-02,
          1.6888e-02,  1.6888e-02]])
cos = tensor([[ 1.1695,  1.1695,  1.1586,  ...,  1.0057,  1.0028,  1.0028],
        [ 0.6304,  0.6304,  0.8459,  ...,  1.005...8111,  0.8425,  ...,  0.9942,  0.9971,  0.9971],
        [ 0.1992,  0.1992,  0.4756,  ...,  0.9941,  0.9971,  0.9971]])
scale = tensor([[1.1695, 1.1586, 1.1485,  ..., 1.0087, 1.0057, 1.0028],
        [1.1667, 1.1559, 1.1460,  ..., 1.0086, 1.0056,...0.8592, 0.8671, 0.8745,  ..., 0.9916, 0.9945, 0.9973],
        [0.8571, 0.8651, 0.8726,  ..., 0.9915, 0.9944, 0.9972]])

    def apply_rotary_pos_emb(x, sin, cos, scale=1):
        sin, cos = map(lambda t: duplicate_interleave(t * scale), (sin, cos))
        # einsum notation for lambda t: repeat(t[offset:x.shape[1]+offset,:], "n d -> () n () (d j)", j=2)
>       return (x * cos) + (rotate_every_two(x) * sin)
E       RuntimeError: The size of tensor a (512) must match the size of tensor b (64) at non-singleton dimension 2

../anaconda3/envs/myenv2/lib/python3.9/site-packages/long_net/utils.py:221: RuntimeError
========================================================= warnings summary =========================================================
../anaconda3/envs/myenv2/lib/python3.9/site-packages/transformers/utils/generic.py:441
  /home/mg/anaconda3/envs/myenv2/lib/python3.9/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
    _torch_pytree._register_pytree_node(

../anaconda3/envs/myenv2/lib/python3.9/site-packages/transformers/utils/generic.py:309
  /home/mg/anaconda3/envs/myenv2/lib/python3.9/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
    _torch_pytree._register_pytree_node(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
===================================================== short test summary info ======================================================
FAILED tests/attention.py::TestDilatedAttention::test_attention_distribution - AssertionError: False is not true
FAILED tests/attention.py::TestDilatedAttention::test_attention_outputs - AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'
FAILED tests/attention.py::TestDilatedAttention::test_dropout - AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'
FAILED tests/attention.py::TestDilatedAttention::test_forward_pass - AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'
FAILED tests/attention.py::TestDilatedAttention::test_output_shape - AssertionError: torch.Size([2, 64, 512]) != (2, 128, 512)
FAILED tests/attention.py::TestDilatedAttention::test_relative_position_bias - RuntimeError: The size of tensor a (512) must match the size of tensor b (2) at non-singleton dimension 3
FAILED tests/attention.py::TestDilatedAttention::test_xpos - RuntimeError: The size of tensor a (512) must match the size of tensor b (64) at non-singleton dimension 2
============================================= 7 failed, 5 passed, 2 warnings in 6.50s ==============================================

Mar 07 '24 14:03 bruicecode

DilatedAttention is not working properly (the output shape is wrong). I'm having the same issue as well.

Apr 14 '24 02:04 Inzilbeth