ESRT
ESRT copied to clipboard
What is common.Scale(1) means?
class Scale(nn.Module):
def __init__(self, init_value=1e-3):
super().__init__()
self.scale = nn.Parameter(torch.FloatTensor([init_value]))
def forward(self, input):
return input * self.scale
When the self.scale=1, does this option does nothing? Why do we need this layer?
Is the self.scale learnable parameters 𝜆𝑥 in the paper?