Train Error
(venv) personalinfo@MacBook-Pro-3 LongNet % python3 train.py 2024-03-05 23:56:10,524 - numexpr.utils - INFO - NumExpr defaulting to 8 threads. 2024-03-05 23:56:17.908409: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. /Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( /Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( Using StableAdamWUnfused-v1 training: 0%| | 0/100000 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/Users/personalinfo/LongNet/train.py", line 84, in <module>
loss = model(next(train_loader))
File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/personalinfo/LongNet/long_net/model.py", line 356, in forward
logits = self.net(x_inp, **kwargs)
File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/personalinfo/LongNet/long_net/model.py", line 302, in forward
x = self.transformer(x)
File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/personalinfo/LongNet/long_net/model.py", line 271, in forward
x = block(x) + x
RuntimeError: The size of tensor a (4128) must match the size of tensor b (8196) at non-singleton dimension 1
(venv) personalinfo@MacBook-Pro-3 LongNet % python3 train.py
2024-03-06 00:09:22,364 - numexpr.utils - INFO - NumExpr defaulting to 8 threads.
2024-03-06 00:09:27.673362: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
Using StableAdamWUnfused-v1
training: 0%| | 0/100000 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/Users/personalinfo/LongNet/train.py", line 84, in <module>
loss = model(next(train_loader))
File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/personalinfo/LongNet/long_net/model.py", line 356, in forward
logits = self.net(x_inp, **kwargs)
File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/personalinfo/LongNet/long_net/model.py", line 302, in forward
x = self.transformer(x)
File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/personalinfo/LongNet/long_net/model.py", line 271, in forward
x = block(x) + x
RuntimeError: The size of tensor a (4128) must match the size of tensor b (8196) at non-singleton dimension 1
After setting up the environment, I ran 'python3 train.py' and this happened. Can you have a check? Thank you!
I ran this script in PyCharm's venv environment.
Can you help me solve this question? I think the attention part has some problem since when I run 'pytest tests/attention.py' , error occurs like this: @kyegomez
(myenv2) mg@ubuntu:~/LongNet$ pytest tests/attention.py
======================================================= test session starts ========================================================
platform linux -- Python 3.9.18, pytest-7.4.2, pluggy-1.4.0
rootdir: /home/mg/LongNet
plugins: anyio-4.3.0, time-machine-2.14.0, hydra-core-1.0.7
collected 12 items
tests/attention.py .FFFF.FF...F [100%]
============================================================= FAILURES =============================================================
_________________________________________ TestDilatedAttention.test_attention_distribution _________________________________________
self = <attention.TestDilatedAttention testMethod=test_attention_distribution>
def test_attention_distribution(self):
input_tensor = torch.randn(2, 128, 512)
dilated_attention = DilatedAttention(512, 8, 2, 64)
_, attn_weights = dilated_attention(input_tensor)
> self.assertTrue(
torch.allclose(attn_weights.sum(dim=-1), torch.tensor(1.0))
)
E AssertionError: False is not true
tests/attention.py:114: AssertionError
___________________________________________ TestDilatedAttention.test_attention_outputs ____________________________________________
self = <attention.TestDilatedAttention testMethod=test_attention_outputs>
def test_attention_outputs(self):
> output = self.sparse_dilated_attention(self.x)
E AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'
tests/attention.py:151: AttributeError
________________________________________________ TestDilatedAttention.test_dropout _________________________________________________
self = <attention.TestDilatedAttention testMethod=test_dropout>
def test_dropout(self):
> self.sparse_dilated_attention.dropout.p = 1.0
E AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'
tests/attention.py:156: AttributeError
______________________________________________ TestDilatedAttention.test_forward_pass ______________________________________________
self = <attention.TestDilatedAttention testMethod=test_forward_pass>
def test_forward_pass(self):
> output = self.sparse_dilated_attention(self.x)
E AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'
tests/attention.py:145: AttributeError
______________________________________________ TestDilatedAttention.test_output_shape ______________________________________________
self = <attention.TestDilatedAttention testMethod=test_output_shape>
def test_output_shape(self):
# Setup
input_tensor = torch.randn(2, 128, 512)
dilated_attention = DilatedAttention(512, 8, 2, 64)
# Action
output = dilated_attention(input_tensor)
# Assert
> self.assertEqual(output.shape, (2, 128, 512))
E AssertionError: torch.Size([2, 64, 512]) != (2, 128, 512)
tests/attention.py:18: AssertionError
_________________________________________ TestDilatedAttention.test_relative_position_bias _________________________________________
self = <attention.TestDilatedAttention testMethod=test_relative_position_bias>
def test_relative_position_bias(self):
# Setup
input_tensor = torch.randn(2, 128, 512)
dilated_attention = DilatedAttention(
512, 8, 2, 64, use_rel_pos_bias=True
)
# Action
> output = dilated_attention(input_tensor)
tests/attention.py:39:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1511: in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1520: in _call_impl
return forward_call(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = DilatedAttention(
(dropout): Dropout(p=0.0, inplace=False)
(attention): FlashAttention(
(attn_dropout): Dropou...Linear(in_features=512, out_features=512, bias=True)
(proj_v): Linear(in_features=512, out_features=512, bias=True)
)
x = tensor([[[[ 1.4205e+00, -4.5398e-01, 9.8770e-01, ..., -2.6991e-02,
-1.1310e+00, 1.4456e-03],
[...231e-02],
[ 3.7678e-01, -1.1879e-01, 2.9864e-01, ..., -5.8582e-02,
-3.6311e-01, 7.1331e-01]]]])
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""Forward pass of the DilatedAttention module.
Args:
x (torch.Tensor): The input tensor.
Returns:
torch.Tensor: The output tensor.
"""
batch_size, seq_len, _ = x.shape
padding_len = -seq_len % self.segment_size
x = F.pad(x, (0, 0, 0, padding_len))
seq_len = seq_len + padding_len
if self.use_xpos:
x = self.xpos(x)
# Split and sparsify
x = x.view(batch_size, -1, self.segment_size, self.dim)
x = x[:, :, :: self.dilation_rate, :]
# qk_norm
if self.qk_norm:
q, k, v = map(
self.norm, (self.proj_q(x), self.proj_k(x), self.proj_v(x))
)
else:
q, k, v = self.proj_q(x), self.proj_k(x), self.proj_v(x)
# Perform attention
attn_output = self.attention(q, k, v)
# if use rel pos => apply relative positioning bias
if self.use_rel_pos_bias:
> attn_output += self.relative_bias(
batch_size, attn_output.size(1), attn_output.size(1)
)
E RuntimeError: The size of tensor a (512) must match the size of tensor b (2) at non-singleton dimension 3
../anaconda3/envs/myenv2/lib/python3.9/site-packages/long_net/attention.py:123: RuntimeError
__________________________________________________ TestDilatedAttention.test_xpos __________________________________________________
self = <attention.TestDilatedAttention testMethod=test_xpos>
def test_xpos(self):
# Setup
input_tensor = torch.randn(2, 128, 512)
dilated_attention = DilatedAttention(512, 8, 2, 64, use_xpos=True)
# Action
> output = dilated_attention(input_tensor)
tests/attention.py:26:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1511: in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1520: in _call_impl
return forward_call(*args, **kwargs)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/long_net/attention.py:104: in forward
x = self.xpos(x)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1511: in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1520: in _call_impl
return forward_call(*args, **kwargs)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/long_net/utils.py:254: in forward
x = apply_rotary_pos_emb(x, sin, cos, scale)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
x = tensor([[[-0.1896, 0.1342, -0.3958, ..., -0.4303, -0.2086, 0.4187],
[-0.2131, 0.3323, 0.4395, ..., -1.5...59, 0.2221, ..., 1.7753, -1.4079, 1.2502],
[ 1.5155, -1.4299, 0.4873, ..., -1.0910, -0.7816, -0.7960]]])
sin = tensor([[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[ 9.817..., 1.6756e-02],
[ 8.3368e-01, 8.3368e-01, 7.2268e-01, ..., 2.2456e-02,
1.6888e-02, 1.6888e-02]])
cos = tensor([[ 1.1695, 1.1695, 1.1586, ..., 1.0057, 1.0028, 1.0028],
[ 0.6304, 0.6304, 0.8459, ..., 1.005...8111, 0.8425, ..., 0.9942, 0.9971, 0.9971],
[ 0.1992, 0.1992, 0.4756, ..., 0.9941, 0.9971, 0.9971]])
scale = tensor([[1.1695, 1.1586, 1.1485, ..., 1.0087, 1.0057, 1.0028],
[1.1667, 1.1559, 1.1460, ..., 1.0086, 1.0056,...0.8592, 0.8671, 0.8745, ..., 0.9916, 0.9945, 0.9973],
[0.8571, 0.8651, 0.8726, ..., 0.9915, 0.9944, 0.9972]])
def apply_rotary_pos_emb(x, sin, cos, scale=1):
sin, cos = map(lambda t: duplicate_interleave(t * scale), (sin, cos))
# einsum notation for lambda t: repeat(t[offset:x.shape[1]+offset,:], "n d -> () n () (d j)", j=2)
> return (x * cos) + (rotate_every_two(x) * sin)
E RuntimeError: The size of tensor a (512) must match the size of tensor b (64) at non-singleton dimension 2
../anaconda3/envs/myenv2/lib/python3.9/site-packages/long_net/utils.py:221: RuntimeError
========================================================= warnings summary =========================================================
../anaconda3/envs/myenv2/lib/python3.9/site-packages/transformers/utils/generic.py:441
/home/mg/anaconda3/envs/myenv2/lib/python3.9/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
../anaconda3/envs/myenv2/lib/python3.9/site-packages/transformers/utils/generic.py:309
/home/mg/anaconda3/envs/myenv2/lib/python3.9/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
===================================================== short test summary info ======================================================
FAILED tests/attention.py::TestDilatedAttention::test_attention_distribution - AssertionError: False is not true
FAILED tests/attention.py::TestDilatedAttention::test_attention_outputs - AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'
FAILED tests/attention.py::TestDilatedAttention::test_dropout - AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'
FAILED tests/attention.py::TestDilatedAttention::test_forward_pass - AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'
FAILED tests/attention.py::TestDilatedAttention::test_output_shape - AssertionError: torch.Size([2, 64, 512]) != (2, 128, 512)
FAILED tests/attention.py::TestDilatedAttention::test_relative_position_bias - RuntimeError: The size of tensor a (512) must match the size of tensor b (2) at non-singleton dimension 3
FAILED tests/attention.py::TestDilatedAttention::test_xpos - RuntimeError: The size of tensor a (512) must match the size of tensor b (64) at non-singleton dimension 2
============================================= 7 failed, 5 passed, 2 warnings in 6.50s ==============================================
DilatedAttention is not working properly (the output shape is wrong). I'm having the same issue as well.