Confusion about NetVLAD and PointNetVLAD

Open where2go947 opened this issue 3 years ago • 0 comments

Hi, thanks for your Pytorch implementation of PointNetVLAD. I'm a new learner. When I read the code, I'm confused at PointNetVLAD:

https://github.com/cattaneod/PointNetVlad-Pytorch/blob/ff00ff07b17f35db9ddaea15d04c94b491051d4d/models/PointNetVlad.py#L45-L81

According to my view:

activation at line 59 represents the distribution of each point n belonging to the cluster k. (the weight)
a at line 62 represents the learned cluster
But what vlad at line 68 means? Where is the residuals and the sum of them according to points n?

Why not the following:

def forward(self, x):
        x = x.transpose(1, 3).contiguous()
        x = x.view((-1, self.max_samples, self.feature_size)) # [B,N,C]
        activation = torch.matmul(x, self.cluster_weights)
        if self.add_batch_norm:
            # activation = activation.transpose(1,2).contiguous()
            activation = activation.view(-1, self.cluster_size)
            activation = self.bn1(activation)
            activation = activation.view(-1,
                                         self.max_samples, self.cluster_size)
            # activation = activation.transpose(1,2).contiguous()
        else:
            activation = activation + self.cluster_biases
        activation = self.softmax(activation)
        activation = activation.view((-1, self.max_samples, self.cluster_size))
        
        a_sum = activation.sum(-2, keepdim=True)
        a = a_sum * self.cluster_weights2 # [B,C,k]
        
       ### ----------------- different-------------------
        N = x.shape[1]
        residual = x.unsqueeze(-1).repeat(1,1,1,self.cluster_size) - \
                          a.unsqueeze(1).repeat(1,N,1,1)  # [B,C,N,k]
        vlad = activation.unsqueeze(2) * residual
        vlad = torch.sum(vlad, dim=1) # [B,C,k]
       ### ----------------- different-------------------

        # intra-normalization and L2 normalize
        vlad = F.normalize(vlad, dim=1, p=2)    
        vlad = vlad.reshape((-1, self.cluster_size*self.feature_size))
        vlad = F.normalize(vlad, dim=1, p=2)

        # compress into a compact output
        vlad = torch.matmul(vlad, self.hidden1_weights)
        vlad = self.bn2(vlad)

        if self.gating:
            vlad = self.context_gating(vlad)

        return vlad

Is there anything wrong with my understanding? Thanks in advance!

Oct 22 '22 08:10 where2go947