ARTEMIS-4553 Support partial word matches for address settings
At this point I think we should stick with the currently documented behavior, i.e. wildcards match (whole) words separated by a delimiter. The design of the matching is to be hierarchical which is relatively easy to understand and configure with words separated by a delimiter and wildcards that represent one (i.e. * by default) or more (i.e. # by default) words. The fact that partial matches work now (for whatever reason) is not a sufficient reason to change the documented functionality. I chalk this up to an implementation detail and not something that users should rely upon as it may change in the future. It is an undocumented, incidental behavior.
I'm curious about others' thoughts.
Thinking about this more...I'm more against it than before. I believe that opening this door is going to be bad for usability - both for users and developers.
Right now, * means a single word. If we start accepting "partial words" then what does * become? Is it a single word when used alone and then something else when used with a partial word? If the latter, is it any single character? Is it 0 or more of any character (e.g. as it might be in a regular expression)? Also, how do partial words compare to each other in a hierarchy where matches are ordered from general to specific? Would ab* be more specific than a* when matching abc? What about a*c? Is that even supported? Where do we draw the line?
Furthermore, what do we do with #? Should we support partial matches with it? If so, what does that mean? If not, why not?
The potential configurations start to expand very quickly and will no doubt add complication to the code, the test-suite, and the documentation.
The currently documented functionality is simple & powerful, and we should keep it that way.
If there's a bug here it's that undocumented behavior is allowed and somewhat functional leading folks to assume it's intentional. I'm not saying we should fix that necessarily, but we should at least consider it so we don't keep letting folks get confused.
The design of the matching is to be hierarchical which is relatively easy to understand and configure with words separated by a delimiter and wildcards that represent one (i.e. * by default) or more (i.e. # by default) words.
Partial words also would be hierarchical, I mean * would never match a delimiter.
Right now,
*means a single word. If we start accepting "partial words" then what does*become? Is it a single word when used alone and then something else when used with a partial word? If the latter, is it any single character? Is it 0 or more of any character (e.g. as it might be in a regular expression)?
My tentative was to implement the behavior similar to the * in the shells: matching zero or more characters but not the delimiter to respect the hierarchy.
Also, how do partial words compare to each other in a hierarchy where matches are ordered from general to specific? Would
ab*be more specific thana*when matchingabc? What abouta*c? Is that even supported? Where do we draw the line?
Good catch, I hadn't thought to this use case but if that would be supported then a*c would be more specific of a*
Furthermore, what do we do with
#? Should we support partial matches with it? If so, what does that mean? If not, why not?
Theoretically, * should be enough for any partial match use cases because # matches zero or more words.
The potential configurations start to expand very quickly and will no doubt add complication to the code, the test-suite, and the documentation.
This is an important point from the development point of view, are you thinking to any specific cases?
If there's a bug here it's that undocumented behavior is allowed and somewhat functional leading folks to assume it's intentional. I'm not saying we should fix that necessarily, but we should at least consider it so we don't keep letting folks get confused.
My tentative was to clarify this gray area without causing issues to users that are already using this officially unsupported behavior.
Good catch, I hadn't thought to this use case but if that would be supported then
a*cwould be more specific ofa*
What about multiple * characters? For example, would a*c* match abcd and ac and abc?
This is an important point from the development point of view, are you thinking to any specific cases?
I'm not thinking about any specific case. I'm mainly thinking that the possible combinations that need to be tested will increase substantially with this change, especially if multiple * characters are supported.
This change will mean that while * by itself still means a single word when * is combined with other characters it will completely change its meaning to zero or more characters. I think this will ultimately hurt usability.
@gemmellr @jbertram thanks for your feedback, I converted this PR to draft because I need more time to think.