Modify how losses are computed in a multi-replicas hyperopt
Update various things for the hyperopt run card.
@Cmurilochem, @goord, I fear that we have to re-run the hyperopt, again. The previous models fail badly the positivity selection, and the following "might" fix that.
⚠️ please run the hyperopt from this branch.
@Cmurilochem, are you actually including penalties in losses?
https://github.com/NNPDF/nnpdf/blob/e37aa4e58491eae3b8822af3bccd53fa78e0da18/n3fit/runcards/hyperopt_studies/restricted_search_space_renew_hyperopt.yml#L131
At some point in then draft you mentioned that this is not the case.
Hi @Radonirinaunimi I'm afraid most of our GPU budget is burnt and I'm not sure the folks at SURF are willing to give us more again...
About the penalties in the hyperloss: I'm pretty sure they are excluded. Also, I received an invite for the slack channel but it won't allow me in!
Hi @Radonirinaunimi I'm afraid most of our GPU budget is burnt and I'm not sure the folks at SURF are willing to give us more again...
Hi @goord, I hope it was not our runs that burnt the GPUs 😬
For the paper, I think it would be realistic to only perform 250 trials. This is because from the PDF point of view we are doing a proof of concept, and we use restricted parameter space anyway. Do you think this would not even be possible? At some point, @Cmurilochem was planning to run 3 more sets of 250.
About the penalties in the hyperloss: I'm pretty sure they are excluded.
Ok, good! As it should be. I will modify the card here.
Also, I received an invite for the slack channel but it won't allow me in!
Hmhm, what is the message that you received? Maybe @juanrojochacon knows how to solve this?
Well we can at least run a 5-day job, there is enough budget for that. In the meantime we can explore our options for more compute (Leonardo or new pilot project on Snellius?). @Cmurilochem maybe you can find the time to start a job?
Regarding the slack: tried again and now it works
yes @goord @Cmurilochem Slack has been a mess in the last few weeks but it is sorted out now, back to our Pro plan so all communication can proceed via there as usual now. Thanks!
Hi @Radonirinaunimi I'm afraid most of our GPU budget is burnt and I'm not sure the folks at SURF are willing to give us more again...
Hi @goord, I hope it was not our runs that burnt the GPUs 😬
For the paper, I think it would be realistic to only perform 250 trials. This is because from the PDF point of view we are doing a proof of concept, and we use restricted parameter space anyway. Do you think this would not even be possible? At some point, @Cmurilochem was planning to run 3 more sets of 250.
About the penalties in the hyperloss: I'm pretty sure they are excluded.
Ok, good! As it should be. I will modify the card here.
Also, I received an invite for the slack channel but it won't allow me in!
Hmhm, what is the message that you received? Maybe @juanrojochacon knows how to solve this?
Hi @Radonirinaunimi and @goord. Yes. I excluded penalties in all runs. So, if this is the problem, nothing to be worried about.
Also, @goord is right. We have a limited budget and I suspect that we left a 3-4 days job. Tomorrow I am back home and will submit it again. But we currently have more than 250 trials for sure.
Also, @goord is right. We have a limited budget and I suspect that we left a 3-4 days job. Tomorrow I am back home and will submit it again.
Perfect, thanks! Please use this branch for the runs.
But we currently have more than 250 trials for sure.
I fear that we cannot use those, unfortunately. But we should always make backups of them, just in case.
Hi @Radonirinaunimi and @goord. Just submitted the new hyperopt from this branch; we currently have just ~3.5 days of budget. I am currently on holidays with family, but will find some time to give you feedback on the progress of the calculation.
Unrelated: it looks like the polarized theories have been updated (cc, @giacomomagni, @scarlehoff)? Now the C-factors are no longer present. This is the reason why the tests are failing.
Unrelated: it looks like the polarized theories have been updated (cc, @giacomomagni, @scarlehoff)? Now the C-factors are no longer present. This is the reason why the tests are failing.
maybe I've forgot them during my last update... I'll check it
EDIT: Something went wrong when removing the eko.tar, now it should be okay.
The changes in 00a9839 looks good to me! I think that upon fixing the tests (which I can take care), we can merge this.
no probs, I can do that. I will also add the proportion key to the docs.
no probs, I can do that. I will also add the proportion key to the docs.
Great! Thanks 🙏🏼