nnpdf icon indicating copy to clipboard operation
nnpdf copied to clipboard

Normalisation factors in HERA data

Open enocera opened this issue 4 years ago • 3 comments

@AleCandido and @felixhekhorn report that there are multiple versions of HERA data, that possibly differ for the normalisation factor. This should be investigated and any redundant information that could lead to ambiguities must be removed. This is related to #1275 .

enocera avatar Jul 16 '21 12:07 enocera

@AleCandido @felixhekhorn (cc @juanrojochacon ): I've looked into the HERACOMB issue that you've raised at the phone call. Let me try to repeat your question: you are asking why there are two versions of the CC rawdata files in the nnpdf/buildmaster/rawdata/HERACOMB folder, labelled w/ and w/o the suffix _orig. Is my understanding correct?

If yes, before answering this question, let me say that only the un-suffixed version of the rawdata is used in the filter, and that only one version of DATA files is produced: DATA_HERACOMBCCEM.dat and DATA_HERACOMBCCEP.dat. The format of the observable in these DATA files is consistent, respectively, with what APFEL calls DIS_CCE and DIS_CCP. I confess that I originally interpreted your question as if we had two different version of DATA files, which is not the case.

Now coming to your question, I don't know why there are two rawdata versions w/ and w/ the suffix _orig. I suspect that the files with the _orig suffix are a remnant of a previous implementation or an outdated version of the un-suffixed files. However all this seems quite immaterial: the files with the _orig suffix are not used anywhere. Also, if you look at the Hepdata entry (in particular at Tables 6 and 7), you'll realise that the expectation value of the reduced cross section is consistent with the un-suffixed version of the rawdata files (the one that we currently use). Therefore I think that we should just remove the un-suffixed version of the rawdata files, as they are not used anywhere and as we don't seem to be able to understand their origin. I think that this also partly addresses your other remark about data set names for observable variations. I think that - at least in this case - we do not need name variations, given that we don't have observable variations. In principle, if you want to make things elegant and understandable, you may think of using the same naming convention that we have introduced for new data sets in NNPDF4.0 (<exp_name>_<additional_info>.dat), which I've used, e.g., in the newly implemented HERACOMB_SIGMARED_C.dat file. Another question is how to name observables (and, if I understood correctly, how to name them in YADISM w.r.t. APFEL): here of course you decide.

enocera avatar Jul 16 '21 13:07 enocera

Thanks for checking @enocera I agree with your analysis. And yes, let's remove from the repo all useless and obsolete files, to avoid further confusion

juanrojochacon avatar Jul 16 '21 13:07 juanrojochacon

@enocera thanks! okay good to know - we'll keep thinking about our naming convention ...

felixhekhorn avatar Jul 16 '21 14:07 felixhekhorn