Adding new datasets to the NNPDF4.1 runcard
Hi @enocera before I forget let me add here what we agreed concerning adding new datasets to the NNPDF4.1 runcard.
Once the pull request https://github.com/NNPDF/nnpdf/pull/2370 is merged, in parallel to doing the positivity checks that are suggested in https://github.com/NNPDF/nnpdf/issues/2376, we also want to one by one add new datasets, to make sure things are under control.
It is not urgent, but whenever you have time maybe you can add here a list of datasets that are missing from https://github.com/NNPDF/nnpdf/pull/2370 and that we would like to run fits with? This includes
- Datasets that we already considered in NNPDF4.0, but which were discarded by our dataset selection procedure.
- Datasets that are considered in the pheno paper and for which FK tables should be ready
- Datasets not in the pheno paper and for which grids / FK tables are now being computed
and we start from the first category.
So once you add the list, we can coordinate among us to start step by step adding these new datasets. Thanks.
PDFs-nnpdf40-vs-newbase-vs-nnpdf41prel-errors.pdf
When adding new experiments, one of the things we want to look is the change in the relative uncertainties: a very constraining element will lead to a large variation in the PDF uncertainties and viceversa.
@jekoorn @kamillaurent As promised, here is the order in which I would add data sets to new fits (cc @juanrojochacon).
- Top pair data sets - total cross sections. Each data set is made of a single data point. I don't expect any major problem, and likewise I don't expect any impact on PDFs. You can include all of them altogether, no need to perform a single fit with one data set at a time. -- ATLAS_TTBAR_13P6TEV_TOT -- ATLAS_TTBAR_5TEV_TOT -- ATLAS_TTBAR_13TEV_2L_TOT -- CMS_TTBAR_13TEV_35P9FB-1_TOT -- CMS_TTBAR_13TEV_35P9FB-1_TAU_TOT -- CMS_TTBAR_13P6TEV_TOT
- DIS+jets. You can include these one at a time in multiple sequential fits or all altogether in a single fit, correlations are included by default, so no double counting. -- H1_1JET_319GEV_290PB-1_DIF -- H1_1JET_319GEV_351PB-1_DIF -- H1_2JET_319GEV_290PB-1_DIF -- H1_2JET_319GEV_351PB-1_DIF -- ZEUS_1JET_300GEV_38P6PB-1_DIF -- ZEUS_1JET_319GEV_82PB-1_DIF -- ZEUS_2JET_319GEV_374PB-1_DIF
- Uncontroversial DY data. You can include these one at a time in multiple sequential fits or altogether in a single fit, data sets are largely independent and small in size. They have already been tested in the pheno paper, so no surprises are expected. -- ATLAS_WPWM_13P6TEV_TOT -- ATLAS_Z0_13P6TEV_TOT -- CMS_WPWM_13TEV_ETA -- LHCB_Z0_13TEV_DIMUON_2022
- Top pair data sets - differential distributions. Here you MUST be cautious. Each data set comes with multiple differential distributions, that cannot be included altogether because correlations are not known and you want to avoid double counting. My suggestion is to include one distribution at a time in the fit. Tanishq has already studied them, although not with the default NNPDF4.1 theoretical and methodological settings, see here -- ATLAS_TTBAR_13TEV_HADR_DIF -- ATLAS_TTBAR_13TEV_LJ_DIF -- CMS_TTBAR_13TEV_LJ_DIF -- CMS_TTBAR_13TEV_2L_138FB-1_DIF
- Jet data sets. Here you MUST be EXTREMELY cautious. There are non-negligible EWK K-factors, and possible additional uncertainties from power corrections. I think that these data sets are best handled by the Edinburgh contingent. -- ATLAS_1JET_13TEV_DIF -- ATLAS_2JET_13TEV_DIF -- CMS_1JET_13TEV_DIF -- CMS_2JET_13TEV_DIF
As of now, you can proceed with the first item of the list. The second and the third require to finalise the production of FK tables, see here, that we will discuss at the next code meeting. The fourth and fifth items are more towards data characterisation, for which we are not completely ready yet.
Thanks @enocera this is perfect.
Since now we have a baseline, we can start adding these various datasets, taking into account your caveats above.
I think @jekoorn and @kamillaurent can proceed to add these datasets to the fits, following the instructions above, and add them one by one to
https://www.wiki.ed.ac.uk/spaces/nnpdfwiki/pages/735674207/Tests+fits+with+NNPDF4.1+settings
Then we can discuss the results at the phone conf. I agree that we can start by running the fits to the uncontroversial datasets!