Vladislav Sovrasov
Vladislav Sovrasov
Most of modern hardware architectures uses FMA instructions for operations with tensors. FMA computes a*x+b as one operation. Roughly GMACs = 0.5 * GFLOPs
MAC = Multiply–accumulate operation
@chnadell yes, you're right! @snownus also figured it out. I'll edit the first post to avoid any future confusions.
@cassie101 makes sense, I'll change it
@code-by-jin yes, exactly, we should multiply GMACS by 2 to get FLOPS
In the original resnet paper authors mixed up macs and flops. As far as I remember, they provided a definition of flops that considers one flop as multiply & add...
ptflops can take into account only ops that are derived from nn.Module. Support of torch.bmm and similar operations requires complete redesign of the flops counting mechanism. I've started thinking of...
Hi! Currently nn.MultiheadAttention is fully supported. Other manually written transformer attention heads are not supported
`requires_grad` prevents pytorch from computing gradients for particular parameters during training. This flag doesn't affect forward pass complexity, which is measured by ptflops. See pytorch docs for datails.
There is no standard module in torch.nn representing graph convolutions, while ptflops can account pytorch's modules only. You can also write a custom hook for your GCN implementation and pass...