pisa
pisa copied to clipboard
Bug when applying cuts in `events_pi.py` for certain variable names
This chunk of code does not work if one variable_name is a subset of another variable_name:
https://github.com/icecube/pisa/blob/master/pisa/core/events_pi.py#L553-L556
for variable_name in variables:
crit_str = crit_str.replace(
variable_name, 'self["%s"]["%s"]' % (key, variable_name)
)
For example: when trying to load both reco_z and reco_zenith, I end up with expressions like self["nue_cc"]["reco_z"]enith.
Code to verify behavior:
key = "nue_cc"
crit_str = "(reco_zenith >= 0) & (reco_z >= 500)"
for variable_name in ['reco_z','reco_zenith']:
crit_str = crit_str.replace(
variable_name, 'self["%s"]["%s"]' % (key, variable_name)
)
print(crit_str)
currently gives:
(self["nue_cc"]["reco_z"]enith >= 0) & (self["nue_cc"]["reco_z"] >= 500)
but the desired behavior is:
(self["nue_cc"]["reco_zenith"] >= 0) & (self["nue_cc"]["reco_z"] >= 500)
import re
key = "nue_cc"
crit_str = "(reco_zenith >= 0) & (reco_z >= 500)"
for variable_name in ['reco_z','reco_zenith']:
# Using word boundary \b to replace whole words only
crit_str = re.sub(r'\b{}\b'.format(variable_name), 'self["%s"]["%s"]' % (key, variable_name), crit_str)
print(crit_str)
A re module with a word boundary expression will solve the issue. If it looks ok, I can make a pull request.
Seems like a reasonable low-effort fix to make.