alanspike comments

Results 11 comments of


                                            alanspike

Accuracy from distillation?

79.2 for L1 is trained w/ distillation for 300 epochs. We haven't trained the model w/o distillation, but can try it later.

`restore` options to resume `distill.py`

Hi @youjinChung, thanks again for your interests. Just checking whether you could run the code successfully.

What about gpu inference time?

Hi @edwardyehuang, with the latest iOS and Xcode, you could get the latency directly on CPU or CPU & GPU following this [link](https://developer.apple.com/videos/play/wwdc2022/10027/).

segmentation is not correct

Hi @yeyan00 , did you train the model on your own dataset or use the pre-trained model from ADE20K on your dataset?

segmentation is not correct

Hi @yeyan00, I'll close the issue for now. Please feel free to open it if you'd like to discuss more about the model/dataset training. Thanks.

Distilling doesn't work as expected.

Maybe you could try to increase the reconstruction loss for student training. For example, increase `--lambda_recon` to `100` and give it a try.

Distilling doesn't work as expected.

Thanks for sharing the results. It's a bit weird since the `lambda_recon` is used to increase the reconstruction loss between the student model and the teacher model as shown [here](https://github.com/snap-research/CAT/blob/7a3e8b1d36f392577b84c515e789b28dc4e70d6f/distillers/inception_distiller.py#L166-L167)...

Distilling doesn't work as expected.

Could you try to set the weight of the adversarial loss as zero and see whether the reconstruction loss is decreasing?

Distilling doesn't work as expected.

Could you maybe set this [loss](https://github.com/snap-research/CAT/blob/7a3e8b1d36f392577b84c515e789b28dc4e70d6f/distillers/inception_distiller.py#L170) as zero, and comment the training of discriminator [here](https://github.com/snap-research/CAT/blob/7a3e8b1d36f392577b84c515e789b28dc4e70d6f/distillers/inception_distiller.py#L182-L185)? I'm not sure about the reason so I wonder maybe we could try to remove...

Distilling doesn't work as expected.

Maybe the obtained student network is too small using the default target FLOPs for the larger-resolution. Could you try using larger FLOPs to compress?