FMA-Net Model Parameters is different from the paper

I found that when num_seq=10 and num_flow=9, the measured parameter count is 10.41M; when num_seq=3 and num_flow=2, the measured parameter count is 9.37M. But the parameter count mentioned in the paper is 9.6M. Under what input conditions was this tested? Does the parameter count of FMA-Net change with the number of input frame sequences? If the number of input frames changes during training and testing, can the model still function correctly?

The parameter computation snippet is as follows.

class Config:
    def __init__(self, config_dict):
        for key, value in config_dict.items():
            setattr(self, key, value)

def test_fmanet():
    config_dict = {
        'stage': 2,
        'scale': 4,
        'num_seq': 10,
        'ds_kernel_size': 20,
        'in_channels': 3,
        'dim': 90,
        'ds_kernel_size': 20,
        'us_kernel_size': 5,
        'num_RDB': 12,
        'growth_rate': 18,
        'num_dense_layer': 4,
        'num_flow': 9,
        'num_FRMA': 4,
        'num_transformer_block': 2,
        'num_heads': 6,
        'LayerNorm_type': 'WithBias',
        'ffn_expansion_factor': 2.66,
        'bias': False,
    }
    config = Config(config_dict)

    net = FMANet(
        config
    ).cuda()
    net.eval()

    t = 10 
    input = torch.rand(1, 3, t, 180, 320).cuda()

    macs, _ = profile(model=net, inputs=(input, ), verbose=False)
    params = sum(p.numel() for p in net.parameters())

Above snippet returns that parameters is 10.41M.

Jul 04 '24 12:07 DachunKai

FMA-Net is a kind of sliding window-based method, and the number of input frames is one of the hyperparameters. Therefore, the number of input frames during training and testing for FMA-Net is fixed. The performance & complexity variations based on this are discussed in Sec. A.3 and Table 6 of the supplementary materials. Additionally, the number of parameters mentioned in the paper is based on the hyperparameter settings we provide. Specifically, it excludes the count of parameters such as a_conv that are not used during inference.

Jul 05 '24 05:07 GeunhyukYouk

Thanks. Is there any error with the experimenta.cfg file? According to my understanding, num_seq should be equal to num_flow + 1, i.e., if using three frames as a sliding window. Three frames can only generate two optical flows between adjacent frames.

Jul 05 '24 06:07 DachunKai

The hyperparameters num_flow and num_seq are independent. Specifically, we obtain multiple flows (which are optimized through warping loss) and divide the channels (dim) into num_flow parts to perform different warping operations on each part (therefore, dim should be divisible by num_flow). The implementation can be found here.

Jul 05 '24 07:07 GeunhyukYouk

@GeunhyukYouk Does num_seq mean the number of frames in each sliding window? Are there any limitations on the test video frame length, such as the relationship between the total number of frames and num_seq, num_flow?

Jul 05 '24 08:07 DachunKai

Yes, num_seq refers to the window size of the sliding window (i.e., the number of input frames). The test video is independent of num_flow and must consist of at least num_seq frames.

Jul 05 '24 11:07 GeunhyukYouk