file.py _get_flags needs to be reworked
What does this PR do?
It causes the _get_flags function to return a bitwise integer as it was designed to do.
What issues does this PR fix or reference?
Fixes: https://github.com/saltstack/salt/issues/62676
Previous Behavior
The function _get_flags was designed to return an integer, but since python3.6 it returns the names of re flags instead
ie
python 3.6 to 3.10
MULTILINE
# or
IGNORECASE|MULTILINE
and with python3.11, it returns this
re.MULTILINE
# or
re.IGNORECASE|re.MULTILINE
this proceeding re. causes an error with python3.11
New Behavior
Rework this function so that in lieu of attempting to convert flag names to integers we instead convert (integers, strings, list of strings) to re flags. ie converting integer to re flags
elif isinstance(flags, int):
return re.RegexFlag(flags)
get re flag name ie re.MULTILINE from flag name ie MULTILINE
_flag = getattr(re.RegexFlag, str(flag).upper(), None)
Merge requirements satisfied?
[NOTICE] Bug fixes or features added to Salt require tests.
- [ ] Docs
- [x] Changelog - https://docs.saltproject.io/en/master/topics/development/changelog.html
- [ ] Tests written/updated
Commits signed with GPG?
Yes
Also, I didn't see any reason to use try/except:
try:
_flag = int(getattr(re, str(flag).upper()))
except AttributeError as exc:
raise CommandExecutionError("Exception: {}".format(exc))
As the exception will be dumped to the screen anyway.
The flags stopped being integers back in Python 3.6, so why is it only an issue now?
I didn't see any reason to use try/except:
To correctly wrap it so it can be handled as a Salt exception.
This needs tests, and I'm pretty sure it doesn't actually fix the linked issue.
That's not a very friendly or collaborative attitude.
The linked issue appears to be because the passed flags are already flags, not strings. The code is going to need to be something like this:
_flags_acc = [0] # reducing an empty list will raise TypeError
for flag in flags:
if isinstance(flag, re.RegexFlag):
_flags_acc.append(int(flag))
else:
_flag = getattr(re.RegexFlag, str(flag).upper(), None)
if not isinstance(_flag, re.RegexFlag):
raise SaltInvocationError(f"Invalid re flag given: {flag}")
_flags_acc.append(int(_flag))
return reduce(operator.__or__, _flags_acc)
It can be simplified further by replacing the list with a single int, and append with |=.
Converting the flags to int shouldn't actually be necessary either.
A better idea may be to always return a RegexFlag object. They're easily constructed:
>>> re.RegexFlag(24)
re.MULTILINE|re.DOTALL
That looks promising. I'm traveling ATM, but I'll play around with that logic later. Thanks.
I know you're trying to help. But please let the devs handle this. Thanks.
@OrangeDog is right, this needs tests. And it also needs a changelog file. Also please be respectful. @OrangeDog helps out around here a lot. We cannot do everything. and cannot be in everywhere.
_flags_acc = [0] # reducing an empty list will raise TypeError for flag in flags: if isinstance(flag, re.RegexFlag): _flags_acc.append(int(flag)) else: _flag = getattr(re.RegexFlag, str(flag).upper(), None) if not isinstance(_flag, re.RegexFlag): raise SaltInvocationError(f"Invalid re flag given: {flag}") _flags_acc.append(int(_flag)) return reduce(operator.__or__, _flags_acc)
I really like this logic! I'm just trying to understand one piece of it.
If this function is able to accept a string, a list of strings, or an integer then under what circumstances would this get hit on?
if isinstance(flag, re.RegexFlag):
_flags_acc.append(int(flag))
Oh wait I see...we 'd have to also allow the function to accept (and by extension return) a RegexFlag object as well.
However, if this function accepted a RegexFlag why wouldn't we just return it (vs converting it to an integer and then converting back to a RegexFlag object?
I mean that's what was done with an integer value before
elif isinstance(flags, int):
return flags
UPDATE: nevermind, I see your proposal returns an integer.
Ok, I'll try and rework things to have the function accept and return a RegexFlag object. We'll see what that looks like...
This was written back when regex flags were integers. In Python 3.6 they were changed into special enums, but this (and related) functions have not been updated before. Presumably there are some callers of this function that are passing flags, which now cause this problem when you try to look them up by their __str__.
As support for Python <3.6 is no longer needed, passing ints internally could be dropped, but the public interface still needs to handle them. That was the suggestion of my last comment:
if instanceof(flags, re.RegexFlag):
return flags
elif instanceof(flags, int):
return re.RegexFlag(flags)
elif instanceof(flags, str):
flags = [flags]
_flags = re.RegexFlag(0)
for flag in flags:
_flag = getattr(re.RegexFlag, str(flag).upper(), None)
if not isinstance(_flag, re.RegexFlag):
raise SaltInvocationError(f"Invalid re flag given: {flag}")
_flags |= _flag
return _flags
But that probably also means changing (and thoroughly testing) the rest of the module that deals with regex flags.
I think this is the proper why to do it. I mean there's only a couple instances of where _add_flags and _get_flags are called.
Just wondering whether we should edit this comment and allow replace to accept a re.RegexFlag data type.
flags (list or int)
A list of flags defined in the ``re`` module documentation from the
Python standard library. Each list item should be a string that will
correlate to the human-friendly flag name. E.g., ``['IGNORECASE',
'MULTILINE']``. Optionally, ``flags`` may be an int, with a value
corresponding to the XOR (``|``) of all the desired flags. Defaults to
8 (which supports 'MULTILINE').
There's no way for the common use-case to give it a re.RegexFlag object. It has to be something deserializable from YAML.
If you use the py renderer for your .sls it would work, but I think adding it to the documentation would just confuse most people.
There's something off with raise SaltInvocationError
I mean you can literally do this
def _get_flags(flags):
raise SaltInvocationError('xxxx')
And nothing will happen.
Maybe there's a different/better salt exception we should be using in this function?
Going to use CommandExecutionError for now...
Uggg....trying to understand this function
def search(path, pattern, flags=8, bufsize=1, ignore_if_missing=False, multiline=False):
...
if multiline:
flags = _add_flags(flags, "MULTILINE")
bufsize = "file"
I don't get why we're allowing flags and multiline to both be specified.
One could easily cancel out the other. It's funny that the default value for flags is 8 (which is MULTILINE), but the default value for multiline is false
Maybe I'll just have it evaluate both variables for now....
Also I don't think bufsize=1 is doing anything. I'm getting this error when enabling it.
/usr/lib/python3.11/site-packages/salt/utils/files.py:385: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
f_handle = open(*args, **kwargs) # pylint: disable=resource-leakage
```
bufsize is an even bigger job to sort out: #57584
@OrangeDog Ok I reworked file.py based on your suggestions. It could definitely be fined-tuned a bit especially with the leading comments in _get_flags and you've got a better python vocabulary then I do for sure. I like the idea of using the variable name re_flags (as opposed to just flags) to capture the output of _get_flags and _add_flags and there's no mistaking what sort of data type that variable holds.
bufsizeis an even bigger job to sort out: #57584
Damn...
I'm not sure that this needs a test case for this as the module will continue to function as before. Only the data type that _get_flags returns has changed. I can't seem to find a single test case that verifies the data type returned by an internal function.
The existing tests didn't catch the original issue, so there are more to be written there.
You can also write tests for the internal function. And/or ensure sufficient coverage of everything that calls it.
This logic works
def test__get_flags():
"""
Test to ensure _get_flags returns a regex flag
"""
flags = 10
ret = filemod._get_flags(flags)
assert isinstance(ret, re.RegexFlag)
flags = "MULTILINE"
ret = filemod._get_flags(flags)
assert isinstance(ret, re.RegexFlag)
flags = ["IGNORECASE", "MULTILINE"]
ret = filemod._get_flags(flags)
assert isinstance(ret, re.RegexFlag)
flags = re.IGNORECASE | re.MULTILINE
ret = filemod._get_flags(flags)
assert isinstance(ret, re.RegexFlag)
Is everyone happy with just checking the returned data type?
Screw it I'm going to change it to this so we can effectively verify the returned type and value
def test__get_flags():
"""
Test to ensure _get_flags returns a regex flag
"""
flags = 10
ret = filemod._get_flags(flags)
assert ret == re.IGNORECASE | re.MULTILINE
flags = "MULTILINE"
ret = filemod._get_flags(flags)
assert ret == re.MULTILINE
flags = ["IGNORECASE", "MULTILINE"]
ret = filemod._get_flags(flags)
assert ret == re.IGNORECASE | re.MULTILINE
flags = re.IGNORECASE | re.MULTILINE
ret = filemod._get_flags(flags)
assert ret == re.IGNORECASE | re.MULTILINE
@whytewolf Shouldn't this PR have made it into v3005.1-2 as it's a bugfix?
@leifliddy No, 3005.1-2 is package fixes only, things like requisites and python version for onedir and windows based installs. This is what that -2 means. it is still 3005.1 none of the code of salt changed.
Ahhhh... ok got it. Thanks for explaining that.