alanspike
alanspike
79.2 for L1 is trained w/ distillation for 300 epochs. We haven't trained the model w/o distillation, but can try it later.
Hi @youjinChung, thanks again for your interests. Just checking whether you could run the code successfully.
Hi @edwardyehuang, with the latest iOS and Xcode, you could get the latency directly on CPU or CPU & GPU following this [link](https://developer.apple.com/videos/play/wwdc2022/10027/).
Hi @yeyan00 , did you train the model on your own dataset or use the pre-trained model from ADE20K on your dataset?
Hi @yeyan00, I'll close the issue for now. Please feel free to open it if you'd like to discuss more about the model/dataset training. Thanks.
Maybe you could try to increase the reconstruction loss for student training. For example, increase `--lambda_recon` to `100` and give it a try.
Thanks for sharing the results. It's a bit weird since the `lambda_recon` is used to increase the reconstruction loss between the student model and the teacher model as shown [here](https://github.com/snap-research/CAT/blob/7a3e8b1d36f392577b84c515e789b28dc4e70d6f/distillers/inception_distiller.py#L166-L167)...
Could you try to set the weight of the adversarial loss as zero and see whether the reconstruction loss is decreasing?
Could you maybe set this [loss](https://github.com/snap-research/CAT/blob/7a3e8b1d36f392577b84c515e789b28dc4e70d6f/distillers/inception_distiller.py#L170) as zero, and comment the training of discriminator [here](https://github.com/snap-research/CAT/blob/7a3e8b1d36f392577b84c515e789b28dc4e70d6f/distillers/inception_distiller.py#L182-L185)? I'm not sure about the reason so I wonder maybe we could try to remove...
Maybe the obtained student network is too small using the default target FLOPs for the larger-resolution. Could you try using larger FLOPs to compress?