Mert Kilickaya
Mert Kilickaya
Probably due to memory issues (A bowser is allowed to only use 500MB memory..)
Yes you can use. You can create your own vocabulary, and then extract their own word2vec representation using any trained embedding, like Google News via Gensim. Then, at training/testing time,...
Adding to pumpikano's answer, you can implement gradient reversal layer as 'maximizing' the objective of interest rather than minimization. This can be implemented as minimizing the negative of the cost...
In the paper, the Gradient Reversal Layer is used to go in reverse direction of Gradients. This can be achievable in two ways. You can either reverse the sign of...
Hi Enrico, Many thanks for the swift response! Please see the wandb output for val_acc1 on ImageNet-100, for all the 5 checkpoints  As is evident, the last model (task4,...
Thanks for your response. - **Hyper-param tuning:** I have not performed any hyper-param tuning for linear-probing. I directly used yours for fair comparison. But since you say it, I will...
Thanks for the model, args and the info. I will have a look at these. Thanks!
Cool. I was using it, actually. Will try without it and report any difference.
**Update-1:** I've spent the day to perform hyper-param tuning for offline linear eval. I update here in case someone else wants to see the end result as well. **Tl;dr:** I...
Generally 4-5% below the offline counterpart.