uda
uda copied to clipboard
Why KL-divergence?
Hi,
In the paper, it is written the UDA loss is performed by minimizing cross-entropy, but in the code, it is implemented in KL-divergence. Any further explanation about this? Many thanks!