nnpdf Modify how losses are computed in a multi-replicas hyperopt

Update various things for the hyperopt run card.

@Cmurilochem, @goord, I fear that we have to re-run the hyperopt, again. The previous models fail badly the positivity selection, and the following "might" fix that.

⚠️ please run the hyperopt from this branch.

Aug 16 '24 09:08 Radonirinaunimi

@Cmurilochem, are you actually including penalties in losses?

https://github.com/NNPDF/nnpdf/blob/e37aa4e58491eae3b8822af3bccd53fa78e0da18/n3fit/runcards/hyperopt_studies/restricted_search_space_renew_hyperopt.yml#L131

At some point in then draft you mentioned that this is not the case.

Aug 16 '24 09:08 Radonirinaunimi

Hi @Radonirinaunimi I'm afraid most of our GPU budget is burnt and I'm not sure the folks at SURF are willing to give us more again...

About the penalties in the hyperloss: I'm pretty sure they are excluded. Also, I received an invite for the slack channel but it won't allow me in!

Aug 18 '24 20:08 goord

Hi @Radonirinaunimi I'm afraid most of our GPU budget is burnt and I'm not sure the folks at SURF are willing to give us more again...

Hi @goord, I hope it was not our runs that burnt the GPUs 😬

For the paper, I think it would be realistic to only perform 250 trials. This is because from the PDF point of view we are doing a proof of concept, and we use restricted parameter space anyway. Do you think this would not even be possible? At some point, @Cmurilochem was planning to run 3 more sets of 250.

About the penalties in the hyperloss: I'm pretty sure they are excluded.

Ok, good! As it should be. I will modify the card here.

Also, I received an invite for the slack channel but it won't allow me in!

Hmhm, what is the message that you received? Maybe @juanrojochacon knows how to solve this?

Aug 18 '24 20:08 Radonirinaunimi

Well we can at least run a 5-day job, there is enough budget for that. In the meantime we can explore our options for more compute (Leonardo or new pilot project on Snellius?). @Cmurilochem maybe you can find the time to start a job?

Regarding the slack: tried again and now it works

Aug 18 '24 21:08 goord

yes @goord @Cmurilochem Slack has been a mess in the last few weeks but it is sorted out now, back to our Pro plan so all communication can proceed via there as usual now. Thanks!

Aug 19 '24 06:08 juanrojochacon

Hi @Radonirinaunimi I'm afraid most of our GPU budget is burnt and I'm not sure the folks at SURF are willing to give us more again...

Hi @goord, I hope it was not our runs that burnt the GPUs 😬

For the paper, I think it would be realistic to only perform 250 trials. This is because from the PDF point of view we are doing a proof of concept, and we use restricted parameter space anyway. Do you think this would not even be possible? At some point, @Cmurilochem was planning to run 3 more sets of 250.

About the penalties in the hyperloss: I'm pretty sure they are excluded.

Ok, good! As it should be. I will modify the card here.

Also, I received an invite for the slack channel but it won't allow me in!

Hmhm, what is the message that you received? Maybe @juanrojochacon knows how to solve this?

Hi @Radonirinaunimi and @goord. Yes. I excluded penalties in all runs. So, if this is the problem, nothing to be worried about.

Also, @goord is right. We have a limited budget and I suspect that we left a 3-4 days job. Tomorrow I am back home and will submit it again. But we currently have more than 250 trials for sure.

Aug 19 '24 07:08 Cmurilochem

Also, @goord is right. We have a limited budget and I suspect that we left a 3-4 days job. Tomorrow I am back home and will submit it again.

Perfect, thanks! Please use this branch for the runs.

But we currently have more than 250 trials for sure.

I fear that we cannot use those, unfortunately. But we should always make backups of them, just in case.

Aug 19 '24 07:08 Radonirinaunimi

Hi @Radonirinaunimi and @goord. Just submitted the new hyperopt from this branch; we currently have just ~3.5 days of budget. I am currently on holidays with family, but will find some time to give you feedback on the progress of the calculation.

Aug 21 '24 05:08 Cmurilochem

Unrelated: it looks like the polarized theories have been updated (cc, @giacomomagni, @scarlehoff)? Now the C-factors are no longer present. This is the reason why the tests are failing.

Sep 13 '24 10:09 Radonirinaunimi

Unrelated: it looks like the polarized theories have been updated (cc, @giacomomagni, @scarlehoff)? Now the C-factors are no longer present. This is the reason why the tests are failing.

maybe I've forgot them during my last update... I'll check it

EDIT: Something went wrong when removing the eko.tar, now it should be okay.

Sep 13 '24 19:09 giacomomagni

The changes in 00a9839 looks good to me! I think that upon fixing the tests (which I can take care), we can merge this.

Apr 16 '25 12:04 Radonirinaunimi

no probs, I can do that. I will also add the proportion key to the docs.

Apr 16 '25 13:04 scarlehoff

no probs, I can do that. I will also add the proportion key to the docs.

Great! Thanks 🙏🏼

Apr 16 '25 13:04 Radonirinaunimi