Error in computing Linear Layer Multiply adds
Describe the bug When the linear layer has a multidimensional input and output (shape with 3 dimensions or more) the computed multiple adds will be incorrect.
To Reproduce
Add line similar to in model with
model_stats = summary(model, input_size, img_metas=img_metas, gt_semantic_seg=seg, depth=9, col_names=["input_size", "kernel_size", "output_size", "num_params", "mult_adds"])
Make sure the linear layer has multiple dimensions as below.
Layer (type:depth-idx) Input Shape Kernel Shape Output Shape Param # Mult-Adds
Linear: 5-8 [1, 22528, 64] [64, 256] [1, 22528, 256] 16,640 16,640
Steps to reproduce the behavior:
- Go to '...'
- Click on '....'
- Scroll down to '....'
- See error
Expected behavior
Notice the number of multiply adds is listed as 16640 but should be 374865920
It appears line 161 of https://github.com/TylerYep/torchinfo/blob/main/torchinfo/layer_info.py fails to take into account that the behavior of linear will require applying the kernel beyond the single output dimension.
Screenshots
Layer (type:depth-idx) Input Shape Kernel Shape Output Shape Param # Mult-Adds
Linear: 5-8 [1, 22528, 64] [64, 256] [1, 22528, 256] 16,640 16,640
Additional context Just noticed this was not the correct number of FLOPS in a model using a linear layer such as this.
Did not realize this was markdown. My mistake on the formatting. Thanks for fixing the formatting.
Thanks for reporting this issue. Any PR or help fixing this is much appreciated!
@jlclemon the 1 in the input_size is the batch_dim if i'm not wrong, right? also it would be helpful if you could provide us with the model arch and a gist of what you are trying to do(atleast for a beginner like me). ty
I'm experiencing similar error. It seems that when calculating mult-adds of a torch.nn.Linear, only the first and last dimension of the input tensor (batch size and feature dimension) are considered.
Environment
- System: Ubuntu 22.0 Docker image with GPU support
- Package version:
- pytorch 2.1.1
- torchinfo 1.8.0
Reproduce
from torch.nn import Linear
from torchinfo import summary
bs, cin, cout = 5, 3, 8
model = Linear(cin, cout)
in_size = (bs, 10, cin)
print(summary(model, input_size=in_size, col_names=["input_size", "output_size", "num_params", "mult_adds"]))
in_size = (bs, 100, 100, cin)
print(summary(model, input_size=in_size, col_names=["input_size", "output_size", "num_params", "mult_adds"]))
Output:
============================================================================================================================================
Layer (type:depth-idx) Input Shape Output Shape Param # Mult-Adds
============================================================================================================================================
Linear [5, 10, 3] [5, 10, 8] 32 160
============================================================================================================================================
Total params: 32
Trainable params: 32
Non-trainable params: 0
Total mult-adds (M): 0.00
============================================================================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
============================================================================================================================================
============================================================================================================================================
Layer (type:depth-idx) Input Shape Output Shape Param # Mult-Adds
============================================================================================================================================
Linear [5, 100, 100, 3] [5, 100, 100, 8] 32 160
============================================================================================================================================
Total params: 32
Trainable params: 32
Non-trainable params: 0
Total mult-adds (M): 0.00
============================================================================================================================================
Input size (MB): 0.60
Forward/backward pass size (MB): 3.20
Params size (MB): 0.00
Estimated Total Size (MB): 3.80
============================================================================================================================================
The Mult-Adds for two input sizes are all 160$=5\times(3+1)\times8$, the multiple-accumulate operation amount for input size (5, 1, 3).