salt file.py _get_flags needs to be reworked

What does this PR do?

It causes the _get_flags function to return a bitwise integer as it was designed to do.

What issues does this PR fix or reference?

Fixes: https://github.com/saltstack/salt/issues/62676

Previous Behavior

The function _get_flags was designed to return an integer, but since python3.6 it returns the names of re flags instead ie python 3.6 to 3.10

MULTILINE
# or
IGNORECASE|MULTILINE

and with python3.11, it returns this

re.MULTILINE
# or
re.IGNORECASE|re.MULTILINE

this proceeding re. causes an error with python3.11

New Behavior

Rework this function so that in lieu of attempting to convert flag names to integers we instead convert (integers, strings, list of strings) to re flags. ie converting integer to re flags

elif isinstance(flags, int):
     return re.RegexFlag(flags)

get re flag name ie re.MULTILINE from flag name ie MULTILINE

_flag = getattr(re.RegexFlag, str(flag).upper(), None)

Merge requirements satisfied?

[NOTICE] Bug fixes or features added to Salt require tests.

[ ] Docs
[x] Changelog - https://docs.saltproject.io/en/master/topics/development/changelog.html
[ ] Tests written/updated

Commits signed with GPG?

Yes

Sep 13 '22 19:09 leifliddy

Also, I didn't see any reason to use try/except:

        try:
            _flag = int(getattr(re, str(flag).upper()))
        except AttributeError as exc:
            raise CommandExecutionError("Exception: {}".format(exc))

As the exception will be dumped to the screen anyway.

Sep 13 '22 19:09 leifliddy

The flags stopped being integers back in Python 3.6, so why is it only an issue now?

I didn't see any reason to use try/except:

To correctly wrap it so it can be handled as a Salt exception.

Sep 15 '22 12:09 OrangeDog

This needs tests, and I'm pretty sure it doesn't actually fix the linked issue.

Sep 15 '22 13:09 OrangeDog

That's not a very friendly or collaborative attitude.

The linked issue appears to be because the passed flags are already flags, not strings. The code is going to need to be something like this:

_flags_acc = [0]  # reducing an empty list will raise TypeError
for flag in flags:
    if isinstance(flag, re.RegexFlag):
        _flags_acc.append(int(flag))
    else:
        _flag = getattr(re.RegexFlag, str(flag).upper(), None)
        if not isinstance(_flag, re.RegexFlag):
            raise SaltInvocationError(f"Invalid re flag given: {flag}")
        _flags_acc.append(int(_flag))
return reduce(operator.__or__, _flags_acc)

It can be simplified further by replacing the list with a single int, and append with |=.

Sep 15 '22 13:09 OrangeDog

Converting the flags to int shouldn't actually be necessary either.

A better idea may be to always return a RegexFlag object. They're easily constructed:

>>> re.RegexFlag(24)
re.MULTILINE|re.DOTALL

Sep 15 '22 15:09 OrangeDog

That looks promising. I'm traveling ATM, but I'll play around with that logic later. Thanks.

Sep 15 '22 15:09 leifliddy

I know you're trying to help. But please let the devs handle this. Thanks.

@OrangeDog is right, this needs tests. And it also needs a changelog file. Also please be respectful. @OrangeDog helps out around here a lot. We cannot do everything. and cannot be in everywhere.

Sep 15 '22 16:09 whytewolf

_flags_acc = [0]  # reducing an empty list will raise TypeError
for flag in flags:
    if isinstance(flag, re.RegexFlag):
        _flags_acc.append(int(flag))
    else:
        _flag = getattr(re.RegexFlag, str(flag).upper(), None)
        if not isinstance(_flag, re.RegexFlag):
            raise SaltInvocationError(f"Invalid re flag given: {flag}")
        _flags_acc.append(int(_flag))
return reduce(operator.__or__, _flags_acc)

I really like this logic! I'm just trying to understand one piece of it. If this function is able to accept a string, a list of strings, or an integer then under what circumstances would this get hit on?

if isinstance(flag, re.RegexFlag):
    _flags_acc.append(int(flag))

Oh wait I see...we 'd have to also allow the function to accept (and by extension return) a RegexFlag object as well. However, if this function accepted a RegexFlag why wouldn't we just return it (vs converting it to an integer and then converting back to a RegexFlag object?

I mean that's what was done with an integer value before

elif isinstance(flags, int):
    return flags

UPDATE: nevermind, I see your proposal returns an integer.

Ok, I'll try and rework things to have the function accept and return a RegexFlag object. We'll see what that looks like...

Sep 20 '22 17:09 leifliddy

This was written back when regex flags were integers. In Python 3.6 they were changed into special enums, but this (and related) functions have not been updated before. Presumably there are some callers of this function that are passing flags, which now cause this problem when you try to look them up by their __str__.

As support for Python <3.6 is no longer needed, passing ints internally could be dropped, but the public interface still needs to handle them. That was the suggestion of my last comment:

if instanceof(flags, re.RegexFlag):
  return flags
elif instanceof(flags, int):
  return re.RegexFlag(flags)
elif instanceof(flags, str):
  flags = [flags]

_flags = re.RegexFlag(0)
for flag in flags:
        _flag = getattr(re.RegexFlag, str(flag).upper(), None)
        if not isinstance(_flag, re.RegexFlag):
            raise SaltInvocationError(f"Invalid re flag given: {flag}")
        _flags |= _flag
return _flags

But that probably also means changing (and thoroughly testing) the rest of the module that deals with regex flags.

Sep 20 '22 17:09 OrangeDog

I think this is the proper why to do it. I mean there's only a couple instances of where _add_flags and _get_flags are called.

Just wondering whether we should edit this comment and allow replace to accept a re.RegexFlag data type.

flags (list or int)
    A list of flags defined in the ``re`` module documentation from the
    Python standard library. Each list item should be a string that will
    correlate to the human-friendly flag name. E.g., ``['IGNORECASE',
    'MULTILINE']``. Optionally, ``flags`` may be an int, with a value
    corresponding to the XOR (``|``) of all the desired flags. Defaults to
    8 (which supports 'MULTILINE').

Sep 20 '22 18:09 leifliddy

There's no way for the common use-case to give it a re.RegexFlag object. It has to be something deserializable from YAML.

If you use the py renderer for your .sls it would work, but I think adding it to the documentation would just confuse most people.

Sep 20 '22 18:09 OrangeDog

There's something off with raise SaltInvocationError I mean you can literally do this

def _get_flags(flags):
    raise SaltInvocationError('xxxx')

And nothing will happen.
Maybe there's a different/better salt exception we should be using in this function? Going to use CommandExecutionError for now...

Sep 20 '22 19:09 leifliddy

Uggg....trying to understand this function

def search(path, pattern, flags=8, bufsize=1, ignore_if_missing=False, multiline=False):
 ...
    if multiline:
        flags = _add_flags(flags, "MULTILINE")
        bufsize = "file"

I don't get why we're allowing flags and multiline to both be specified. One could easily cancel out the other. It's funny that the default value for flags is 8 (which is MULTILINE), but the default value for multiline is false Maybe I'll just have it evaluate both variables for now....

Also I don't think bufsize=1 is doing anything. I'm getting this error when enabling it.

/usr/lib/python3.11/site-packages/salt/utils/files.py:385: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  f_handle = open(*args, **kwargs)  # pylint: disable=resource-leakage
```

Sep 20 '22 20:09 leifliddy

bufsize is an even bigger job to sort out: #57584

Sep 20 '22 21:09 OrangeDog

@OrangeDog Ok I reworked file.py based on your suggestions. It could definitely be fined-tuned a bit especially with the leading comments in _get_flags and you've got a better python vocabulary then I do for sure. I like the idea of using the variable name re_flags (as opposed to just flags) to capture the output of _get_flags and _add_flags and there's no mistaking what sort of data type that variable holds.

Sep 21 '22 00:09 leifliddy

bufsize is an even bigger job to sort out: #57584

Damn...

Sep 21 '22 00:09 leifliddy

I'm not sure that this needs a test case for this as the module will continue to function as before. Only the data type that _get_flags returns has changed. I can't seem to find a single test case that verifies the data type returned by an internal function.

Sep 21 '22 19:09 leifliddy

The existing tests didn't catch the original issue, so there are more to be written there.

You can also write tests for the internal function. And/or ensure sufficient coverage of everything that calls it.

Sep 21 '22 20:09 OrangeDog

This logic works

def test__get_flags():
    """
    Test to ensure _get_flags returns a regex flag
    """
    flags = 10
    ret = filemod._get_flags(flags)
    assert isinstance(ret, re.RegexFlag)

    flags = "MULTILINE"
    ret = filemod._get_flags(flags)
    assert isinstance(ret, re.RegexFlag)

    flags = ["IGNORECASE", "MULTILINE"]
    ret = filemod._get_flags(flags)
    assert isinstance(ret, re.RegexFlag)

    flags = re.IGNORECASE | re.MULTILINE
    ret = filemod._get_flags(flags)
    assert isinstance(ret, re.RegexFlag)

Is everyone happy with just checking the returned data type?

Sep 23 '22 02:09 leifliddy

Screw it I'm going to change it to this so we can effectively verify the returned type and value

def test__get_flags():
    """
    Test to ensure _get_flags returns a regex flag
    """
    flags = 10
    ret = filemod._get_flags(flags)
    assert ret == re.IGNORECASE | re.MULTILINE

    flags = "MULTILINE"
    ret = filemod._get_flags(flags)
    assert ret == re.MULTILINE

    flags = ["IGNORECASE", "MULTILINE"]
    ret = filemod._get_flags(flags)
    assert ret == re.IGNORECASE | re.MULTILINE

    flags = re.IGNORECASE | re.MULTILINE
    ret = filemod._get_flags(flags)
    assert ret == re.IGNORECASE | re.MULTILINE

Sep 24 '22 00:09 leifliddy

@whytewolf Shouldn't this PR have made it into v3005.1-2 as it's a bugfix?

Nov 12 '22 22:11 leifliddy

@leifliddy No, 3005.1-2 is package fixes only, things like requisites and python version for onedir and windows based installs. This is what that -2 means. it is still 3005.1 none of the code of salt changed.

Nov 12 '22 23:11 whytewolf

Ahhhh... ok got it. Thanks for explaining that.

Nov 12 '22 23:11 leifliddy