DeepCore icon indicating copy to clipboard operation
DeepCore copied to clipboard

Confusion about the gradient matrix used

Open MohammadHossein-Bahari opened this issue 2 years ago • 1 comments

Hello,

Thanks for the great work. As asked before (here) I do not see why in several methods like GraNd and submodular functions, you use the concatenation of loss gradient and its multiplication with the last feature embedding as shown here:

            bias_parameters_grads = torch.autograd.grad(loss, outputs)[0] #size: batch_size,num_classes
            weight_parameters_grads = self.model.embedding_recorder.embedding.view(batch_num, 1,
                                    self.embedding_dim).repeat(1, self.args.num_classes, 1) *\
                                    bias_parameters_grads.view(batch_num, self.args.num_classes,
                                    1).repeat(1, 1, self.embedding_dim)
            gradients.append(torch.cat([bias_parameters_grads, weight_parameters_grads.flatten(1)],
                                        dim=1).cpu().numpy()) 

You are basically using the last layer features scaled by the gradient. Do you have any reasons why you choose this instead of the ones common in the literature, like gradient with respect to the last layer parameters?

Thanks!

MohammadHossein-Bahari avatar Jun 22 '23 09:06 MohammadHossein-Bahari

@PatrickZH @Chengcheng-Guo Hello we would also be curious why you didn't just use the last layer gradients but this form. Could you share with us some thoughts?

XianzheMa avatar Jun 23 '24 09:06 XianzheMa