Jet Substructure
Context
This branch is dedicated to implementing substructure tools to the MadAnalysis framework.
Description of the change and benefits
Interface updates
- MultiJet clustering :
madanalysis/jet_clustering/.
jet_algorithm: This module is separate from the original jet clustering interface within Ma5.
This module can be activated using the following command
ma5> define jet_algorithm my_jet antikt radius=0.5
where my_jet is a user-defined jet identifier, antikt is the algorithm to be used which
can be chosen from antikt, cambridge, genkt, kt, gridjet, cdfjetclu, cdfmidpoint,
and siscone. The rest of the arguments are optional if the user won't define the radius, ptmin etc.
default parameters will be chosen. Each algorithm has its own unique set of parameters i.e.
| Algorithm | Parameters & Default values |
|---|---|
antikt, cambridge |
radius=0.4, ptmin=5. |
genkt |
radius=0.4, ptmin=5., exclusive=False, p=-1 |
kt |
radius=0.4, ptmin=5., exclusive=False |
gridjet |
ymax=3., ptmin=5. |
cdfjetclu |
radius=0.4, ptmin=5., overlap=0.5, seed=1., iratch=0. |
cdfmidpoint |
radius=0.4, ptmin=5., overlap=0.5, seed=1., iratch=0., areafraction=1. |
siscone |
radius=0.4, ptmin=5., overlap=0.5, input_ptmin=5., npassmax=1. |
VariableR |
rho=2000., minR=0., maxR=2., ptmin=20. exclusive=False clustertype=CALIKE strategy=Best |
It is also possible to modify the entry after defining it
ma5> define jet_algorithm my_jet cambridge
ma5> set my_jet.ptmin = 200.
ma5> set my_jet.radius = 0.8
Note that when a jet_algorithm is defined MadAnalysis interface will automatically switch to constituent smearing mode. set my_jet.+<tab> will show the dedicated options available for that particular algorithm.
It is possible to display all the jets available in the current session by using display jet_algorithm command:
$ ./bin/ma5 -R
ma5>set main.fastsim.package = fastjet
ma5>define jet_algorithm my_jet cdfmidpoint
ma5>display jet_algorithm
MA5: * Primary Jet Definition :
MA5: fast-simulation package : fastjet
MA5: clustering algorithm : antikt
MA5: + Jet ID : Ma5Jet
MA5: + cone radius = 0.4
MA5: + PT min (GeV) for produced jets = 5.0
MA5: + exclusive identification = true
MA5: + b-jet identification:
MA5: + DeltaR matching = 0.5
MA5: + exclusive algo = true
MA5: + id efficiency = 1.0
MA5: + mis-id efficiency (c-quark) = 0.0
MA5: + mis-id efficiency (light quarks) = 0.0
MA5: + hadronic-tau identification:
MA5: + id efficiency = 1.0
MA5: + mis-id efficiency (light quarks) = 0.0
MA5: --------------------
MA5: * Other Jet Definitions:
MA5: 1. Jet ID = my_jet
MA5: - algorithm : cdfmidpoint
MA5: - radius : 0.4
MA5: - ptmin : 5.0
MA5: - overlap : 0.5
MA5: - seed : 1.0
MA5: - iratch : 0.0
MA5: - areafraction : 1.0
Here primary jet is defined with the original jet definition syntax of MadAnalysis 5 where since we did not specify anything, it uses default antikt configuration. For more info on how to define primary jet see arXiv:2006.09387. Other jet definitions show all the jets which are defined via jet_algorithm keyword.
To remove a jet_algorithm definition one can use remove my_jet command. Note that one can also change the name of the primary jet which is Ma5Jet by default.
ma5>set main.fastsim.JetID = my_primary_jet
Link to SFS: There can only be one jet smearing/tagging definition where in case of existing multi-jet definitions smearing will be applied in constituent level which is used by all jets defined in the framework. Jet tagging is only available for the primary jet.
Updates in expert mode structure
Expert mode is designed to automatically realize and construct the MadAnalysis framework according to the given .ma5 command script. This command script can include all the SFS construction mentioned in arXiv:2006.09387 and it can also include multijet definitions
sfs_card_cms_exo_xx_yy.ma5:
set main.fastsim.package = fastjet
# Define multijet
define jet_algorithm AK08 antikt
set AK08.radius = 0.8
set AK08.ptmin = 200
define jet_algorithm CA15 cambridge radius=1.5 ptmin=200.0
# Define Jet reconstruction efficiencies
define reco_efficiency j 0.925 [abseta <= 1.5]
define reco_efficiency j 0.875 [abseta > 1.5 and abseta <= 2.5]
define reco_efficiency j 0.80 [abseta > 2.5]
# Define Jet smearer
define smearer j with PT sqrt(0.06^2 + pt^2*1.3e-3^2) [abseta <= 0.5 and pt > 0.1]
define smearer j with PT sqrt(0.10^2 + pt^2*1.7e-3^2) [abseta > 0.5 and abseta <= 1.5 and pt > 0.1]
define smearer j with PT sqrt(0.25^2 + pt^2*3.1e-3^2) [abseta > 1.5 and abseta <= 2.5 and pt > 0.1]
# Define B-tagging
define tagger j as b 0.01+0.000038*pt
define tagger c as b 0.25*tanh(0.018*pt)/(1.0+ 0.0013*pt) [abseta < 2.5]
define tagger c as b 0.0 [abseta >=2.5]
define tagger b as b 0.85*tanh(0.0025*pt)*(25.0/(1+0.063*pt)) [abseta < 2.5]
define tagger b as b 0.0 [abseta >= 2.5]
sfs_card_cms_exo_xx_yy.ma5 shows a simple example of multi-jet clustering and detector simulation implementation. First, it chooses the FastJet package as fastsim interpreter, then defines multijet and in the following defines a simple detector simulation. This file can be executed as follows;
$ ./bin/ma5 -Re cms_exo_xx_yy cms_exo_xx_yy sfs_card_cms_exo_xx_yy.ma5
here cms_exo_xx_yy is a given analysis and sfs_card_cms_exo_xx_yy.ma5 holds the information for the detector simulation (PAD requires sfs_card_cms_exo_xx_yy.ma5 to setup detector simulation for the analysis code, it automatically writes the detector simulation according to the given sfs_card file. Note that if there is a card with the same name for multiple analysis files, those analyses can be executed at the same time, hence allowing more efficient execution.). This command will write a folder called cms_exo_xx_yy including all MadAnalysis 5 framework within.
- Multijet definitions: Multijets are automatically defined in
cms_exo_xx_yy/Build/Main/main.cpp(do not change) as follows;
//Adding new jet with ID AK08
std::map<std::string, std::string> JetConfiguration1;
JetConfiguration1["JetID" ] = "AK08";
JetConfiguration1["algorithm" ] = "antikt";
JetConfiguration1["cluster.R" ] = "0.8";
JetConfiguration1["cluster.PTmin" ] = "200.0";
cluster1->LoadJetConfiguration(JetConfiguration1);
//Adding new jet with ID CA15
std::map<std::string, std::string> JetConfiguration2;
JetConfiguration2["JetID" ] = "CA15";
JetConfiguration2["algorithm" ] = "cambridge";
JetConfiguration2["cluster.R" ] = "1.5";
JetConfiguration2["cluster.PTmin" ] = "200.0";
cluster1->LoadJetConfiguration(JetConfiguration2);
all these inputs are interpreted by JetClusterer machinery within MadAnalysis 5. Additionally Makefile has been modified via CXXFLAGS += -DMA5_FASTJET_MODE flag to enable FastJet use within MadAnalysis data structure, without this flag fastjet dependent accessors will not work.
- Analysis folder:
cms_exo_xx_yy/Build/SampleAnalyzer/User/AnalyzerThis folder includes all the definitions written for the detector simulation;
$ ls
analysisList.h cms_exo_xx_yy.h new_smearer_reco.cpp new_tagger.cpp reco.h
cms_exo_xx_yy.cpp efficiencies.h new_smearer_reco.h new_tagger.h sigmas.h
cms_exo_xx_yy.* are the analysis files and the rest are detector simulation modules (Please do not change those files when the analysis submitted in PAD only cms_exo_xx_yy.*, cms_exo_xx_yy.info and sfs_card_cms_exo_xx_yy.ma5 files are allowed. If you need to modify detector simulation, please modify sfs_card_cms_exo_xx_yy.ma5 and re-execute the workspace or include your personal modifications in cms_exo_xx_yy.cpp).
- Multijet accessor: Each multijet is accessible within
c++interface through their unique identifiers. Primary jet is a special case where one can useevent.rec()->jet()accessor to see all primary jets. Rest of the jets can be found usingevent.rec()->jet("AK08")orevent.rec()->jet("CA15")respectively.
Update on expert mode compilation
In order to separate the execution method for Delphes and SFS-FastJet based analyses, a command line option has been added to the setup.sh. In order to enable SFS-FastJet mode use following commands;
source setup.sh --with-fastjet
make clean all
or simply type source setup.sh -h for details.
New shortcuts
RecJet := MA5::RecJetFormat *;
RecJets := std::vector<const Jet>;
RecTau := MA5::RecTauFormat *;
RecTaus := std::vector<const Tau>;
RecLepton := MA5::RecLeptonFormat *;
RecLeptons := std::vector<const Lepton>;
RecPhoton := MA5::RecPhotonFormat *;
RecPhotons := std::vector<const Photon>;
FastJet Contrib toolset
SoftDrop
MAfloat32 z_cut = 0.10;
MAfloat32 beta = 2.0;
Substructure::SoftDrop softDrop(beta, z_cut);
RecJets filtered_jets = filter(event.rec()->jets(), 20., 2.5);
// Only leading jet:
if (filtered_jets.size() > 0)
RecJet softdrop_jet = softDrop.Execute(Jets[0]);
// All jets:
RecJets softdrop_jets = softDrop.Execute(Jets);
Cluster
Execute with reconstructed event sample
Substructure::Cluster cluster(Substructure::antikt, 0.4, 20);
cluster.Execute(event, "AK4");
if (event.rec()->jets("AK4").size() > 0)
INFO << "AK4 Jet PT = " << event.rec()->jets("AK4")[0].pt() << endmsg;
Other execution modes:
MAfloat32 R = 0.5;
MAfloat32 pitmin = 20.; // optional
Substructure::Cluster cluster(Substructure::cambridge, R, ptmin);
// algorithm options antikt, cambridge, kt
RecJets Jets = filter(event.rec()->jets(), 20., 2.5);
// Only leading jet:
if (Jets.size() > 0)
RecJets jet = cluster.Execute(Jets[0]);
// All jets:
RecJets jets = cluster.Execute(Jets);
Filter clustered object
Substructure::Cluster cluster(Substructure::cambridge, 0.2);
RecJets Jets = filter(event.rec()->jets("AK8"), 20., 2.5);
if (Jets.size() > 0) {
RecJets FilteredSubjets = cluster.Execute(
Jets[0], [](const RecJet jet, const RecJet subjet)
{ return subject->pt() > jet->pt() * 0.05 ; }
);
for (auto &jet: FilteredSubjets) INFO << Jets[0]->pt()*0.05 << " < " << jet->pt() << endmsg;
}
Recluster
RecJets Jets = filter(event.rec()->jets(), 20., 2.5);
Substructure::Recluster recluster(Substructure::cambridge, 0.5);
// Only leading jet:
if (Jets.size() > 0)
RecJet jet = recluster.Execute(Jets[0]); // only gives the leading jet
// All jets:
RecJets jets = recluster.Execute(Jets); // only gives the leading jet for each reclustered jet
Nsubjettiness
RecJets Jets = filter(event.rec()->jets(), 20., 2.5);
Substructure::Nsubjettiness nsubjettiness(
1, // order
Substructure::Nsubjettiness::KT_Axes, // axes definition
Substructure::Nsubjettiness::UnnormalizedMeasure, // measure definition
1., // beta
-1., // R0
std::numeric_limits<double>::max() // Rcutoff
);
if (Jets.size() > 0)
MAdouble64 tau1 = nsubjettiness.Execute(Jets[0]);
Available axes definitions:
KT_Axes,
CA_Axes,
AntiKT_Axes,
WTA_KT_Axes,
WTA_CA_Axes,
Manual_Axes,
OnePass_KT_Axes,
OnePass_CA_Axes,
OnePass_AntiKT_Axes,
OnePass_WTA_KT_Axes,
OnePass_WTA_CA_Axes,
Available measure definitions
NormalizedMeasure, // (beta,R0)
UnnormalizedMeasure, // (beta)
NormalizedCutoffMeasure, // (beta,R0,Rcutoff)
UnnormalizedCutoffMeasure, // (beta,Rcutoff)
VariableR Plugin
Initialize through normal mode
define jet_algorithm my_varR VariableR rho=2000 minR=0 maxR=2 ptmin=15 exclusive=False clustertype=CALIKE strategy=Best
Initialisation through analysis:
MAfloat32 rho = 2000.;
MAfloat32 minR = 0.;
MAfloat32 maxR = 2.;
Substructure::ClusterType clusterType = Substructure::VariableR::AKTLIKE; // CALIKE, KTLIKE, AKTLIKE
Substructure::Strategy strategy = Substructure::VariableR::Best; // Best, N2Tiled, N2Plain, NNH, Native
MAfloat32 ptmin = 0.;
MAbool isExclusive = false;
Substructure::VariableR variableR(
rho, minR, maxR, clusterType, strategy, ptmin, isExclusive
);
OR
Substructure::VariableR variableR;
variableR.Initialize(rho, minR, maxR, clusterType, strategy, ptmin, isExclusive);
Execution for the reco event:
std::string JetID = "VarR"
variableR.Execute(event, JetID);
if (event.rec()->jets("VarR").size() > 0)
INFO << "VarR Jet PT = " << event.rec()->jets("VarR")[0].pt() << endmsg;
Execution with reconstructed jets:
RecJets Jets = filter(event.rec()->jets(), 20.);
RecJets VarRJets = variableR.Execute(Jets);
if (VarRJets.size() > 0)
INFO << "VarR Jet PT = " << VarRJets[0]->pt() << endmsg;
Execution with a single jet:
RecJets Jets = filter(event.rec()->jets(), 20.);
RecJets VarRJets = variableR.Execute(Jets[0]);
if (VarRJets.size() > 0)
INFO << "VarR Jet PT = " << VarRJets[0]->pt() << endmsg;
Filter clustered object
RecJets Jets = filter(event.rec()->jets("AK8"), 20., 2.5);
if (Jets.size() > 0) {
RecJets FilteredSubjets = variableR.Execute(
Jets[0], [](const RecJet jet, const RecJet subjet){ return subject->pt() > jet->pt() * 0.05 ; }
);
for (auto &jet: FilteredSubjets) INFO << Jets[0]->pt()*0.05 << " < " << jet->pt() << endmsg;
}
Pruner
execute with a single jet
Substructure::Pruner pruner(Substructure::antikt, 0.1, 1.);
vector<const RecJetFormat*> Jets = filter(event.rec()->jets(), 20.);
const RecJetFormat* PrunedJet = pruner.Execute(Jets[0]);
INFO << "prunned Jet PT = " << PrunedJet->pt() << endmsg;
vector execution
Substructure::Pruner pruner(Substructure::antikt, 0.1, 1.);
vector<const RecJetFormat*> Jets = filter(event.rec()->jets(), 20.);
vector<const RecJetFormat*> PrunedJets = pruner.Execute(Jets);
for (auto &jet: PrunedJets)
INFO << "prunned Jet PT = " << jet->pt() << endmsg;
Jet Filtering
std::vector<const RecJetFormat*> Jets = filter(event.rec()->jets(), 20., 2.5);
Substructure::Selector selector = Substructure::SelectorPtFractionMin(0.03);
MAfloat32 Rfilt = 0.2;
Substructure::Filter jetFilter(Rfilt, selector);
// Execute for a single jet
if (Jets.size() > 0) {
const RecJetFormat *filtjet = jetFilter.Execute(Jets[0]); // only gives the leading jet
INFO << filtjet->pt() << " " << Jets[0]->pt() << endmsg;
}
// execute as a vector
std::vector<const RecJetFormat *> filtjets = jetFilter.Execute(Jets);
Energy Correlator
std::vector<const RecJetFormat*> Jets = filter(event.rec()->jets(), 20., 2.5);
MAuint32 N = 1;
MAfloat32 beta = 0.1;
EnergyCorrelator::Measure measure = EnergyCorrelator::Measure::pt_R;
EnergyCorrelator::Strategy strategy = EnergyCorrelator::Strategy::storage_array;
Substructure::EnergyCorrelator EC(N,beta,measure,strategy);
INFO << EC.Execute(Jets[0]) << endmsg;
Tests were done for backwards compatibility
Not tested.
TODO
- [ ] Remove accesibility of FastJet through analysis.
- [x] Add fj-contrib wrappers.
Drawbacks: Drawbacks have been elevated by splitting delphes and sfs based execution modes. This hasn't been extensively tested yet but should solve the problem.
Note: Object isolation module is different from the main branch. This has been detached from the clustering module completely and moved to the event reconstruction module. https://github.com/MadAnalysis/madanalysis5/blob/c2c0410d95034f00e76b77b902530ab1936bd758/tools/SampleAnalyzer/Process/JetClustering/JetClusterer.cpp#L598-L620
You enforce the jet collection and the imported samples to have different names. However, is this really a necessary requirement (except to avoid things to become a mess)?
Note that I have nothing about this restriction (and I am even if favour of it), but I naively raise the question.
If user uses the same name for jet and a dataset then it would not be possible to modify a defined jet or dataset. For instance
import smp.hepmc as smp
set smp.xsection = 123
define jet_algorithm smp antikt
set smp.ptmin = 20
Hence after the jet definition it will basically overwrite the name so user wont be able to set anything for the sample or jet depending the order of definition.
In cmd_define.py, I do not understand the following lines
62 # Multi-cluster protection 63 logging.getLogger('MA5').warning("Constituent-based smearing will be applied.") 64 self.main.superfastsim.jetrecomode = 'constituents'``` Why do we have to set the `jetrecomode` option to `constituents` as soon as an extra jet definition is required? Can't we do this independently of what is done for the "standard" collection (the one defined through `main.fastim`)?
If there are multiple jet definition in use and user wants to do detector simulation this has to be done in constituent mode. Which means the hadron shower will be modified and fed into each clustered object. There is no one-functional parameterization fit all scheme a parametrization tuned for antikt R=0.4 jet wont fit to a cambridge R=0.4 jet these are completely different objects; hence one needs to apply smearing to the hadrons. If SFS is not in use this is not effecting anything ofc.
in makefile_writer.py, we have:
169 # @JACK: cant use FASTJET_USE tag, it clashes with ROOT 170 self.ma5_fastjet_mode = FalseI assume this will be fixed due to our discussion of earlier today. Is this correct?
Yes I believe it can all be controlled through setup.sh. I was setting the fastjet flag through python but this wont be necessary anymore
In jet_configuration.py, I am wondering whether we should forbid the usage of any non-IR-safe jet algorithm. Who use such an algorithm today anyways? Any thoughts? If we agree on this, this change will have to be pushed everywhere...
I believe that should be up to the user, we have the options and flexibility. It might be useful for an analysis where IR safety is implemented in a later stage (have absolutely no example and never saw them being used).
in makefile_writer.py, we have:
169 # @JACK: cant use FASTJET_USE tag, it clashes with ROOT 170 self.ma5_fastjet_mode = FalseI assume this will be fixed due to our discussion of earlier today. Is this correct?
Yes I believe it can all be controlled through
setup.sh. I was setting the fastjet flag through python but this wont be necessary anymore
Perfect. Then we will update the python and make it clearer later on (added on the to-do list)
Yes I do.
By reviewing the code, I think we should mention somewhere some information about the HT ET and Meff computations that are inclusive quantities that do not care about pTmin requirements set on jet. I leave this here as a side note for later (see tools/SampleAnalyzer/Interfaces/fastjet/ClusterAlgoFastJet.cpp).
Please double check MAbool ClusterAlgoFastJet::Cluster(EventFormat& myEvent, std::string JetID). I have added a missing return statement.
The new softdrop method will have to be heavily documented.
In job_writer.py, you wrote somewhere
@JACK: Why are we using C-shell this is not necessary anymore.Do you mind telling in more details what you have in mind? Thanks in advance.
just removing it. it's an old syntax, isn't it? I'm not even sure if it's used anymore
Hello. You may have forgotten to update the changelog!
Please edit doc/releases/changelog-dev.md with:
- A one-to-two sentence description of the change. You may include a small working example for new features.
- A link back to this PR.
- Your name (or GitHub username) in the contributors section.
I think that you are correct. Let’s put this on the next list of pushes.
Cheers,
Benj
On 22 Apr 2022, at 09:10, Jack Y. Araz @.***> wrote:
In job_writer.py, you wrote somewhere @JACK: Why are we using C-shell this is not necessary anymore. Do you mind telling in more details what you have in mind? Thanks in advance.
just removing it. it's an old syntax, isn't it? I'm not even sure if it's used anymore
— Reply to this email directly, view it on GitHub https://github.com/MadAnalysis/madanalysis5/pull/13#issuecomment-1106499265, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIWJNJ33MMVSXN3DKLBVSJLVGKQLXANCNFSM5MJKRPDQ. You are receiving this because you were mentioned.