ASSL icon indicating copy to clipboard operation
ASSL copied to clipboard

Why simply use the first constrained layer as pruning template for all constrained layers?

Open yumath opened this issue 3 years ago • 1 comments

From the observation of training results, the hard mask's weights between the constrained layers are not exactly aligned. https://github.com/MingSun-Tse/ASSL/blob/a564556c8b578c2ee86d135044f088bfeaafc707/src/pruner/utils.py#L71

yumath avatar Sep 01 '22 11:09 yumath

Hi @yumath , thanks for your interest in our work!

Yes, the hard masks are not exactly aligned, because SSA is a regularization term, which only encourages aligned masks (as shown by the decreasing SSA loss) but cannot guarantee masks will be fully aligned (i.e., SSA loss = 0). We tried improving the penalty strength of SSA to make it even more aligned, but at the price of a performance drop. So the current scheme (a not-so-beautiful solution, the way i see it) simply uses the masks derived from the first constrained Conv layer after applying the SSA penalty. You may use masks derived from other constrained layers. Presumably, I think there should be no obvious difference.

More thoughts - Even if the masks are not fully aligned, reducing the misalignment, per se, is a good thing for pruning and the later finetuning because the gradient flow of the remaining weights would be less distorted and trainability would be better.

Best,

MingSun-Tse avatar Sep 01 '22 16:09 MingSun-Tse

I'll close this issue given no further questions. Feel free to re-open if you see it necessary.

MingSun-Tse avatar Dec 18 '22 03:12 MingSun-Tse