pisa Bug when applying cuts in `events_pi.py` for certain variable names

This chunk of code does not work if one variable_name is a subset of another variable_name:

https://github.com/icecube/pisa/blob/master/pisa/core/events_pi.py#L553-L556

for variable_name in variables:
    crit_str = crit_str.replace(
        variable_name, 'self["%s"]["%s"]' % (key, variable_name)
    )

For example: when trying to load both reco_z and reco_zenith, I end up with expressions like self["nue_cc"]["reco_z"]enith.

May 20 '22 01:05 kayla-leonard

Code to verify behavior:

key = "nue_cc"

crit_str = "(reco_zenith >= 0) & (reco_z >= 500)"
for variable_name in ['reco_z','reco_zenith']:
    crit_str = crit_str.replace(
        variable_name, 'self["%s"]["%s"]' % (key, variable_name)
    )
    
print(crit_str)

currently gives:

(self["nue_cc"]["reco_z"]enith >= 0) & (self["nue_cc"]["reco_z"] >= 500)

but the desired behavior is:

(self["nue_cc"]["reco_zenith"] >= 0) & (self["nue_cc"]["reco_z"] >= 500)

Mar 06 '24 18:03 kayla-leonard

import re

key = "nue_cc"
crit_str = "(reco_zenith >= 0) & (reco_z >= 500)"
for variable_name in ['reco_z','reco_zenith']:
    # Using word boundary \b to replace whole words only
    crit_str = re.sub(r'\b{}\b'.format(variable_name), 'self["%s"]["%s"]' % (key, variable_name), crit_str)

print(crit_str)

A re module with a word boundary expression will solve the issue. If it looks ok, I can make a pull request.

May 17 '24 05:05 JKrishnamoorthi

Seems like a reasonable low-effort fix to make.

Jul 31 '24 08:07 thehrh