Reproducing results in the paper
Hi,
I ran the script provided in PytorchRouting/Examples/run_experiments.py and was unable to get the results of CIFAR-MTL reported in the paper (I'm getting ~53% while the paper reports 70%). I notice that there's a comment in run_experiments.py saying
WPL_routed_all_fc(3, 512, 5, dataset.num_tasks, dataset.num_tasks) Training averages: Model loss: 0.427, Routing loss: 8.864, Accuracy: 0.711 Testing averages: Model loss: 0.459, Routing loss: 9.446, Accuracy: 0.674
I wonder if you are using a different set of hyperparameters in the paper and willing to share the hyperparams with me? Thanks!
Hi. I am aware of this problem. It was the consequence of a major rewrite I did to speed up the code (from loop-based routes within a batch to mask-based routes), which changed the exact impact of the hyperparameters slightly. I have verified the correctness of the new code for our new paper (https://nlp.stanford.edu/pubs/cases2019recursiverouting.pdf), so I can guarantee that the code works. I will try and do a new hyperparameter sweep as soon as I can.