yangcf10
yangcf10
Hi, I am facing similar issues. When dealing with deepfake video, I first crop the detected face region (a typical size is 110x110), and then use this transform: trans =...
> Hi @Dong-Huo , > > As shown in the paper, since the MetaFormer block already has a residual connection, subtraction of the input itself is added in Equation (4)....
> Hi @yangcf10 , > > It is not elegant to remove the residual connection in the block just for the pooling token mixer. It is better to remain the...