Marc Masana comments

Results 17 comments of


                                            Marc Masana

Update: GDumb + minor fixes

@btwardow I forgot to add a test for gdumb, can you add it? I'll check the comment you left and see if I can improve it

Update: GDumb + minor fixes

This will close #7

The meaning of model_old is confusing. In task 1, the number of heads of model_old is sometimes 1 and sometimes 2 during def eval.

Hi Mengya, **Q1**: I'm not sure if I totally understood your question. The `model_old` should always have 1 less head than the current model, since it has no information about...

I have an error in the file exemplars_selection it says no samples to choose from for class 240, how to fix this?

I would need more context to solve this issue. Maybe you have a fixed memory size which is smaller than the total amount of classes? That could be an option.

LwF CIFAR-100 (10/10) No exemplars accuracy

Hi! Could you share the arguments that you used for your setting? That would make it easier to figure out the discrepancy. Maybe you didn't run the experiment with the...

accuracy

I'm not sure if I understand the question, but the accuracy when learning each task is only calculated from the samples of classes belonging to that task. During training, the...

LwM - no gradient in attention distillation loss

Hi @fszatkowski, I remember discussing about this approach and the gradients/loss before, so maybe we missed something. A first change that has not been pushed yet into main is the...

LwM - no gradient in attention distillation loss

I see, the way `.detach()` is called, could indeed block the gradients from updating. I'll first try to reproduce what you propose with the `--gamma` parameter to check it out.

LwM - no gradient in attention distillation loss

You are correct, it seems like that loss is not having an effect indeed. There are no gradients updated, and therefore changing the parameter has no effect and brings the...

Upperbound results

Hi @jmin0530 , The joint training is an incremental one, meaning that the network goes through a training session at each task, with access to all data from previous tasks....