lsnet icon indicating copy to clipboard operation
lsnet copied to clipboard

缺陷

Open zld-make opened this issue 7 months ago • 5 comments

目前,开源代码与论文所述的网络结构不完全一致,尤其以LS Block最为严重

zld-make avatar Sep 27 '25 10:09 zld-make

确实,看了一下代码,LS Block只在前三阶段中每个奇数块的时候被调用,偶数块的时候调用的还是RepVGGDW块+SE。

LQchen1 avatar Oct 29 '25 14:10 LQchen1

而且在表8中替换掉DW和SE flops居然不发生变化,是如何替换的呢?

LQchen1 avatar Oct 29 '25 15:10 LQchen1

而且在表8中替换掉DW和SE flops居然不发生变化,是如何替换的呢?

有可能四舍五入了,毕竟DW和SE相比较整个网络 FLOPs 增加不大,若能精确到小数点后四位也能直接看出差距。关键在于开源代码LS Block存在问题,无法进行复现,但核心的 LSConv 并没有发现缺陷,与论文描述一致。

zld-make avatar Oct 30 '25 04:10 zld-make

但论文核心创新点的代码目前看起来没有缺陷,即LSConv。

zld-make avatar Oct 30 '25 04:10 zld-make

确实,看了一下代码,LS Block只在前三阶段中每个奇数块的时候被调用,偶数块的时候调用的还是RepVGGDW块+SE。

Image应该是这样

Fortunate-ziye avatar Nov 29 '25 00:11 Fortunate-ziye

LSTest( (stem): Stem( (stages): Sequential( (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() (3): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ReLU() (6): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (7): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (lsBlock1): LSBlock( (lsBlock): Sequential( (0): Block( (mixer): RepVGGDW( (conv): Conv2d_BN( (c): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64, bias=False) (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv1): Conv2d_BN( (c): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), groups=64, bias=False) (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (se): SEModule( (fc1): Conv2d(64, 16, kernel_size=(1, 1), stride=(1, 1)) (bn): Identity() (act): ReLU(inplace=True) (fc2): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1)) (gate): Sigmoid() ) (ffn): Residual( (m): FFN( (pw1): Conv2d_BN( (c): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (act): ReLU() (pw2): Conv2d_BN( (c): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) ) ) ) (downsample1): Downsample( (depthwise_conv): Sequential( (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=64, bias=False) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (pointwise_conv): Sequential( (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (lsBlock2): LSBlock( (lsBlock): Sequential( (0): Block( (mixer): RepVGGDW( (conv): Conv2d_BN( (c): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128, bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv1): Conv2d_BN( (c): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), groups=128, bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (se): SEModule( (fc1): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) (bn): Identity() (act): ReLU(inplace=True) (fc2): Conv2d(32, 128, kernel_size=(1, 1), stride=(1, 1)) (gate): Sigmoid() ) (ffn): Residual( (m): FFN( (pw1): Conv2d_BN( (c): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (act): ReLU() (pw2): Conv2d_BN( (c): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) ) (1): Block( (se): Identity() (mixer): LSConv( (lkp): LKP( (cv1): Conv2d_BN( (c): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (act): ReLU() (cv2): Conv2d_BN( (c): Conv2d(64, 64, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=64, bias=False) (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (cv3): Conv2d_BN( (c): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (cv4): Conv2d(64, 144, kernel_size=(1, 1), stride=(1, 1)) (norm): GroupNorm(16, 144, eps=1e-05, affine=True) ) (ska): SKA() (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (ffn): Residual( (m): FFN( (pw1): Conv2d_BN( (c): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (act): ReLU() (pw2): Conv2d_BN( (c): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) ) ) ) (downsample2): Downsample( (depthwise_conv): Sequential( (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=128, bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (pointwise_conv): Sequential( (0): Conv2d(128, 192, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (lsBlock3): LSBlock( (lsBlock): Sequential( (0): Block( (mixer): RepVGGDW( (conv): Conv2d_BN( (c): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192, bias=False) (bn): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv1): Conv2d_BN( (c): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1), groups=192, bias=False) (bn): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (se): SEModule( (fc1): Conv2d(192, 48, kernel_size=(1, 1), stride=(1, 1)) (bn): Identity() (act): ReLU(inplace=True) (fc2): Conv2d(48, 192, kernel_size=(1, 1), stride=(1, 1)) (gate): Sigmoid() ) (ffn): Residual( (m): FFN( (pw1): Conv2d_BN( (c): Conv2d(192, 384, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (act): ReLU() (pw2): Conv2d_BN( (c): Conv2d(384, 192, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) ) (1): Block( (se): Identity() (mixer): LSConv( (lkp): LKP( (cv1): Conv2d_BN( (c): Conv2d(192, 96, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (act): ReLU() (cv2): Conv2d_BN( (c): Conv2d(96, 96, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=96, bias=False) (bn): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (cv3): Conv2d_BN( (c): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (cv4): Conv2d(96, 216, kernel_size=(1, 1), stride=(1, 1)) (norm): GroupNorm(24, 216, eps=1e-05, affine=True) ) (ska): SKA() (bn): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (ffn): Residual( (m): FFN( (pw1): Conv2d_BN( (c): Conv2d(192, 384, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (act): ReLU() (pw2): Conv2d_BN( (c): Conv2d(384, 192, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) ) (2): Block( (mixer): RepVGGDW( (conv): Conv2d_BN( (c): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192, bias=False) (bn): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv1): Conv2d_BN( (c): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1), groups=192, bias=False) (bn): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (se): SEModule( (fc1): Conv2d(192, 48, kernel_size=(1, 1), stride=(1, 1)) (bn): Identity() (act): ReLU(inplace=True) (fc2): Conv2d(48, 192, kernel_size=(1, 1), stride=(1, 1)) (gate): Sigmoid() ) (ffn): Residual( (m): FFN( (pw1): Conv2d_BN( (c): Conv2d(192, 384, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (act): ReLU() (pw2): Conv2d_BN( (c): Conv2d(384, 192, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) ) ) ) (downsample3): Downsample( (depthwise_conv): Sequential( (0): Conv2d(192, 192, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=192, bias=False) (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (pointwise_conv): Sequential( (0): Conv2d(192, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (mas): LSBlock( (lsBlock): Sequential( (0): Block( (mixer): RepVGGDW( (conv): Conv2d_BN( (c): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv1): Conv2d_BN( (c): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), groups=256, bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (se): SEModule( (fc1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1)) (bn): Identity() (act): ReLU(inplace=True) (fc2): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1)) (gate): Sigmoid() ) (ffn): Residual( (m): FFN( (pw1): Conv2d_BN( (c): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (act): ReLU() (pw2): Conv2d_BN( (c): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) ) (1): Block( (se): Identity() (mixer): Residual( (m): Attention( (qkv): Conv2d_BN( (c): Conv2d(256, 384, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (proj): Sequential( (0): ReLU() (1): Conv2d_BN( (c): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (dw): Conv2d_BN( (c): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64, bias=False) (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) (ffn): Residual( (m): FFN( (pw1): Conv2d_BN( (c): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (act): ReLU() (pw2): Conv2d_BN( (c): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) ) (2): Block( (mixer): RepVGGDW( (conv): Conv2d_BN( (c): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv1): Conv2d_BN( (c): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), groups=256, bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (se): SEModule( (fc1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1)) (bn): Identity() (act): ReLU(inplace=True) (fc2): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1)) (gate): Sigmoid() ) (ffn): Residual( (m): FFN( (pw1): Conv2d_BN( (c): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (act): ReLU() (pw2): Conv2d_BN( (c): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) ) (3): Block( (se): Identity() (mixer): Residual( (m): Attention( (qkv): Conv2d_BN( (c): Conv2d(256, 384, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (proj): Sequential( (0): ReLU() (1): Conv2d_BN( (c): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (dw): Conv2d_BN( (c): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64, bias=False) (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) (ffn): Residual( (m): FFN( (pw1): Conv2d_BN( (c): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (act): ReLU() (pw2): Conv2d_BN( (c): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) ) ) ) )

xpbag avatar Dec 24 '25 08:12 xpbag

RuntimeError: The size of tensor a (16) must match the size of tensor b (4) at non-singleton dimension 3我跑源码不知道为什么遇到这个问题

xpbag avatar Dec 24 '25 08:12 xpbag

@Fortunate-ziye 是的,最后一个stage使用Attention还可以理解,但是前面交替使用LSConv和RepVGG+SE,这个算是一个较大的结构改变了,想知道全部使用LSConv或者LSConv+RepVGG+SE的效果怎么样,如果作者能给出这部分的消融就好了,总的来说创新点是很好的。

LQchen1 avatar Dec 24 '25 08:12 LQchen1

你跑的源码,为什么要贴个结构图呢,而且我记得这个报错挺详细的,实在不行你把报错的地方复制给ai,让ai判断一下什么问题> RuntimeError: The size of tensor a (16) must match the size of tensor b (4) at non-singleton dimension 3我跑源码不知道为什么遇到这个问题

Fortunate-ziye avatar Dec 24 '25 09:12 Fortunate-ziye

嗯嗯我知道,这个结构图不重要,主要我好奇作者自己使用没有遇到相关问题吗

xpbag avatar Dec 24 '25 09:12 xpbag

结构图划的手累,哈哈哈哈,你跑的分类吗,我好像没遇到过> 嗯嗯我知道,这个结构图不重要,主要我好奇作者自己使用没有遇到相关问题吗

Fortunate-ziye avatar Dec 24 '25 09:12 Fortunate-ziye

class LSTest(torch.nn.Module): """ i = 0, ed = 64, kd = 16, dpth = 1, nh = 4, ar = 1.0 i = 1, ed = 128, kd = 16, dpth = 2, nh = 4, ar = 2.0 i = 2, ed = 192, kd = 16, dpth = 3, nh = 4, ar = 3.0 i = 3, ed = 256, kd = 16, dpth = 4, nh = 4, ar = 4.0 """

def __init__(self, img_size=224,
             patch_size=16,
             in_chans=3,
             num_classes=1000,
             embed_dim=[64, 128, 192, 256],
             key_dim=[16, 16, 16, 16],
             depth=[1, 2, 3, 4],
             num_heads=[4, 4, 4, 4],
             distillation=False, ):
    super().__init__()
    resolution = img_size // patch_size
    self.stem = lsnet.Stem(3, 64)
    self.lsBlock1 = lsnet.LSBlock(ed=64, kd=16, dpth=1, nh=4, ar=1.0, stage=0, resolution=resolution)
    resolution = (resolution - 1) // 2 + 1
    self.downsample1 = lsnet.Downsample(64, 128, stride=2)
    self.lsBlock2 = lsnet.LSBlock(ed=128, kd=16, dpth=2, nh=4, ar=2.0, stage=1, resolution=resolution)
    resolution = (resolution - 1) // 2 + 1
    self.downsample2 = lsnet.Downsample(128, 192, stride=2)
    self.lsBlock3 = lsnet.LSBlock(ed=192, kd=16, dpth=3, nh=4, ar=3.0, stage=2, resolution=resolution)
    resolution = (resolution - 1) // 2 + 1
    self.downsample3 = lsnet.Downsample(192, 256, stride=2)
    self.mas = lsnet.LSBlock(ed=256, kd=16, dpth=4, nh=4, ar=4.0, stage=3, resolution=resolution)
    self.to(device)
def forward(self, x):
    x = self.stem(x)
    x = self.lsBlock1(x)
    x = self.downsample1(x)
    x = self.lsBlock2(x)
    x = self.downsample2(x)
    x = self.lsBlock3(x)
    x = self.downsample3(x)
    x = self.mas(x)
    return x

if name == 'main':

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = lsnet.LSNet()
# model = LSTest()

model.to(device)
input = torch.randn(1, 3, 224, 224).to(device)
# print(model)
output = model(input)
print(output.shape)我就是简单的自己模块化后测试下原模型,然后发现有问题,然后直接使用原模型测试也有这个错,看之前也有个老哥遇到这问题,不是很明白hhh,但是改好了能跑了

xpbag avatar Dec 24 '25 09:12 xpbag

(q.transpose(-2, -1) @ k) * self.scale

RuntimeError: The size of tensor a (16) must match the size of tensor b (4) at non-singleton dimension 3

xpbag avatar Dec 24 '25 09:12 xpbag

参数维度为 q.shape = torch.Size([1, 4, 16, 16]) k.shape = torch.Size([1, 4, 16, 16]) v.shape = torch.Size([1, 4, 64, 16]) q.transpose(-2, -1).shape = torch.Size([1, 4, 16, 16])

xpbag avatar Dec 24 '25 09:12 xpbag