conformer icon indicating copy to clipboard operation
conformer copied to clipboard

Count of Conformer parameters mismatch with that in the paper

Open maxwellzh opened this issue 4 years ago • 4 comments

In the Conformer original paper, the number of parameters are 截屏2021-10-18 下午3 22 54

However, with the implementation in this repo, the number of parameters are slightly different

Conformer  small: 10.16 M
Conformer medium: 31.86 M
Conformer  large: 120.11 M

I get the size with this script

from conformer import Conformer


def count_parameters(model) -> int:
    return sum(p.numel() for p in model.parameters() if p.requires_grad)


models = {
    'small': Conformer(
        num_classes=1000,
        input_dim=80,
        encoder_dim=144,
        decoder_dim=320,
        num_encoder_layers=16,
        num_decoder_layers=1,
        num_attention_heads=4,
        conv_kernel_size=31
    ),
    'medium': Conformer(
        num_classes=1000,
        input_dim=80,
        encoder_dim=256,
        decoder_dim=640,
        num_encoder_layers=16,
        num_decoder_layers=1,
        num_attention_heads=4,
        conv_kernel_size=31
    ),
    'large': Conformer(
        num_classes=1000,
        input_dim=80,
        encoder_dim=512,
        decoder_dim=640,
        num_encoder_layers=17,
        num_decoder_layers=1,
        num_attention_heads=8,
        conv_kernel_size=31
    )
}

for size, m in models.items():
    print("Conformer {:>6}: {:.2f} M".format(size, count_parameters(m)/1e6))

Since the convolution layer kernel size couldn't be set to 32, I just set it to 31. But this won't make such difference in number of params.

maxwellzh avatar Oct 18 '21 07:10 maxwellzh

This is not an official implementation, so there is a slight difference in the number of parameters.
Of course, I tried to implement it as similar as possible to the contents of the paper. :).

sooftware avatar Oct 18 '21 11:10 sooftware

Also, num_classes affects.

sooftware avatar Oct 18 '21 11:10 sooftware

This is kind of weird. I test several open-source Conformer implementation (I also implement it myself), but none of them can strictly match the reported number of parameters. Do you have any idea where the difference may be? btw. num_classes is set to 1k according to the paper.

maxwellzh avatar Oct 19 '21 06:10 maxwellzh

I'm curious, too. I am only speculating that there may be details not mentioned in the paper.

sooftware avatar Oct 19 '21 06:10 sooftware