salt icon indicating copy to clipboard operation
salt copied to clipboard

Add global_state_conditions handling

Open nicholasmhughes opened this issue 3 years ago • 7 comments

What does this PR do?

This PR introduces a new minion configuration option which allows for global conditions to be introduced into the state system logic. This should help limit repetitive templating wrappers which are used to enable/disable state blocks in multiple formulas.

What issues does this PR fix or reference?

Fixes: #62446

Previous Behavior

If we want to check a standard condition in Grains, we'd have to template around every state block in which we'd want to perform that check:

{% if grains.get("virtual_subtype") != "chroot" -%}
manage_service:
  service.running:
    - name: service_name

{% endif -%}

New Behavior

Now we only need to set a minion configuration option on the host:

global_state_conditions:
  service: ["not G@virtual_subtype:chroot"]

...and then a standard state block such as:

manage_service:
  service.running:
    - name: service_name

...will not be run on a host which doesn't meet the conditions for the state to be run:

----------
          ID: manage_service
    Function: service.running
        Name: service_name
      Result: None
     Comment: Failed to meet global state conditions. State not called.
     Started: 13:30:47.823249
    Duration: 3.012 ms
     Changes:

Merge requirements satisfied?

[NOTICE] Bug fixes or features added to Salt require tests.

  • [x] Docs
  • [x] Changelog - https://docs.saltproject.io/en/master/topics/development/changelog.html
  • [x] Tests written/updated

Commits signed with GPG?

Yes

Please review Salt's Contributing Guide for best practices.

See GitHub's page on GPG signing for more information about signing commits with GPG.

nicholasmhughes avatar Sep 20 '22 13:09 nicholasmhughes

re-run pr-centosstream-9-x86_64-py3-pytest

nicholasmhughes avatar Sep 20 '22 14:09 nicholasmhughes

re-run pr-freebsd-131-amd64-py3-pytest

nicholasmhughes avatar Sep 20 '22 15:09 nicholasmhughes

re-run pr-amazon-2-x86_64-py3-pytest

nicholasmhughes avatar Sep 20 '22 15:09 nicholasmhughes

re-run pr-windows-2019-x64-py3-pytest

nicholasmhughes avatar Sep 20 '22 15:09 nicholasmhughes

Just one question, does this impact performance dramatically when this feature is enabled?

Ch3LL avatar Sep 20 '22 20:09 Ch3LL

Once #62295 is merged, this logic will only add a very small amount of processing time to each state block evaluation , but it should be negligible even at scale (+ ~10sec per 100,000 state blocks)... unless someone decides to abuse the heck out of it and have a ridiculous amount of conditions to evaluate.

nicholasmhughes avatar Sep 20 '22 20:09 nicholasmhughes

re-run pr-centosstream-9-x86_64-py3-pytest

nicholasmhughes avatar Sep 21 '22 19:09 nicholasmhughes

re-run pr-macosx-catalina-x86_64-py3-pytest

nicholasmhughes avatar Sep 23 '22 14:09 nicholasmhughes

re-run pr-macosx-catalina-x86_64-py3-pytest

nicholasmhughes avatar Sep 23 '22 16:09 nicholasmhughes

re-run pr-centos-7-x86_64-py3-tcp-pytest

nicholasmhughes avatar Sep 26 '22 20:09 nicholasmhughes

re-run pr-alma-8-x86_64-py3-pytest

nicholasmhughes avatar Sep 26 '22 20:09 nicholasmhughes

re-run pr-macosx-catalina-x86_64-py3-pytest

nicholasmhughes avatar Sep 27 '22 14:09 nicholasmhughes

I'm starting to think that mac test failure is related because its failed on every test run on this PR

Ch3LL avatar Sep 27 '22 18:09 Ch3LL

well... only test runs since I moved to config.option. still not sure what's going on here though... every other platform passes this test in under 30 secs. MacOS already has a 15 sec addition onto that limit in the test code, and it keeps taking 300+ sec in recent runs.

nicholasmhughes avatar Sep 27 '22 18:09 nicholasmhughes

re-run pr-windows-2019-x64-py3-pytest

nicholasmhughes avatar Sep 28 '22 00:09 nicholasmhughes

@MKLeb @garethgreenaway , should #62901 have resolved the pr-macosx-catalina-x86_64-py3-pytest failure I'm still seeing here?

nicholasmhughes avatar Oct 24 '22 20:10 nicholasmhughes

@nicholasmhughes Doesn't look like the test that failed was the ones addressed in that referenced PR.

garethgreenaway avatar Oct 24 '22 20:10 garethgreenaway

@nicholasmhughes The mac test is not one of the ones fixed by my PR from last week. I have seen that specific test fail a couple times before, but I can't find an example of it. However, it does seem to be failing in a rather profound fashion (almost 8 times the expected duration), which has me inclined to agree with @Ch3LL 's comment. Going to re-run it here and see what happens.

MKLeb avatar Oct 24 '22 23:10 MKLeb

I just don't get what's so different about MacOS that it's taking that long while every other OS takes under 30 seconds...

nicholasmhughes avatar Oct 25 '22 15:10 nicholasmhughes

@nicholasmhughes Are you able to test this on a mac? I'm wondering if debug statements in your code changes will show this test touching your changes unexpectedly. If you don't have a mac, I can try to test it out.

MKLeb avatar Oct 31 '22 14:10 MKLeb

@MKLeb nope. I don't have a Mac.

nicholasmhughes avatar Oct 31 '22 15:10 nicholasmhughes

Mac tests should be fixed since this is closed: https://github.com/saltstack/salt/issues/62829 Will update the branch.

Ch3LL avatar Oct 31 '22 17:10 Ch3LL

Can you resolve the merge conflict

Ch3LL avatar Nov 02 '22 18:11 Ch3LL

@nicholasmhughes Okay, so I've found what is causing the drastic failures. The test in question is actually touching your changes once (and seemingly only once). On average, with my testing machine, the method you added is taking ~170ms. Specifically, the self.functions["config.option"] call is taking greater than 95% of that time. If I take your change out, it returns to passing. It seems the rest of the state call takes around 20ms (i.e. under the threshold of the test). Ideally, we'd like to find a way to have your changes coexist in a way which doesn't exceed the remaining 25 seconds of the threshold.

MKLeb avatar Nov 04 '22 03:11 MKLeb

@MKLeb , we suspected config.option as the problem:

well... only test runs since I moved to config.option. still not sure what's going on here though... every other platform passes this test in under 30 secs. MacOS already has a 15 sec addition onto that limit in the test code, and it keeps taking 300+ sec in recent runs.

Not sure what your references to millisecond return times have to do with the "spawning platform" tests that are failing for MacOS, but I still don't understand what in particular is happening inside config.option only on Macs that is adding that much more time (230-300 sec) that it's failing to return within the 45 second threshold while every other platform returns in under 30 seconds.

Were you able to narrow the "expensive" operations within config.option so we might have an idea as to what's happening on Mac?

nicholasmhughes avatar Nov 07 '22 16:11 nicholasmhughes

Ok... So I don't know precisely what is going on, but I do know one thing for sure - Loading config.option, or any config.<method>, in the separate process spun up by call_parallel takes an aggressively long time. I pinpointed it right to the line that runs load_module. I wasn't sure how to solve that, so I went with a different approach. Factoring your call to _match_global_state_conditions in _call_parallel_target out to the call method like how you call the other one allows the test to pass. Also, I ran your tests with my change and they passed as well. I can make a suggestion here so it's more clear.

EDIT: While I'm not sure this is a complete fix, I would recommend writing some tests for when parallel: True and there are some global state conditions to be matched.

EDIT 2: I realized I never answered your question about why this is only affecting MacOS. The short answer is that I don't know for sure. There is no logic in config.option that targets specifically spawning platforms, nor MacOS. Like I said, it's taking long to load the module funcs, and I saw nothing down the stack or in the loading that targeted MacOS in any way either.

MKLeb avatar Nov 09 '22 01:11 MKLeb

Made a change that was close to what you proposed. Just moved the call to _match_global_state_conditions outside of the nested if since the refactor would've put it in both conditions. Hopefully the tests all run clean now...

nicholasmhughes avatar Nov 21 '22 14:11 nicholasmhughes

Any reasons why this was merged with no discussion, when all the comments on the feature request were against it?

OrangeDog avatar Nov 21 '22 20:11 OrangeDog

The original implementation outlined in the feature request was not put in place. This PR is a broader feature that was discussed in a couple Open Hours.

nicholasmhughes avatar Nov 21 '22 20:11 nicholasmhughes

Ah, ok.

Open Hour discussions should be captured properly. I don't even recall any mention of this in the minutes,

OrangeDog avatar Nov 21 '22 21:11 OrangeDog