Add global_state_conditions handling
What does this PR do?
This PR introduces a new minion configuration option which allows for global conditions to be introduced into the state system logic. This should help limit repetitive templating wrappers which are used to enable/disable state blocks in multiple formulas.
What issues does this PR fix or reference?
Fixes: #62446
Previous Behavior
If we want to check a standard condition in Grains, we'd have to template around every state block in which we'd want to perform that check:
{% if grains.get("virtual_subtype") != "chroot" -%}
manage_service:
service.running:
- name: service_name
{% endif -%}
New Behavior
Now we only need to set a minion configuration option on the host:
global_state_conditions:
service: ["not G@virtual_subtype:chroot"]
...and then a standard state block such as:
manage_service:
service.running:
- name: service_name
...will not be run on a host which doesn't meet the conditions for the state to be run:
----------
ID: manage_service
Function: service.running
Name: service_name
Result: None
Comment: Failed to meet global state conditions. State not called.
Started: 13:30:47.823249
Duration: 3.012 ms
Changes:
Merge requirements satisfied?
[NOTICE] Bug fixes or features added to Salt require tests.
- [x] Docs
- [x] Changelog - https://docs.saltproject.io/en/master/topics/development/changelog.html
- [x] Tests written/updated
Commits signed with GPG?
Yes
Please review Salt's Contributing Guide for best practices.
See GitHub's page on GPG signing for more information about signing commits with GPG.
re-run pr-centosstream-9-x86_64-py3-pytest
re-run pr-freebsd-131-amd64-py3-pytest
re-run pr-amazon-2-x86_64-py3-pytest
re-run pr-windows-2019-x64-py3-pytest
Just one question, does this impact performance dramatically when this feature is enabled?
Once #62295 is merged, this logic will only add a very small amount of processing time to each state block evaluation , but it should be negligible even at scale (+ ~10sec per 100,000 state blocks)... unless someone decides to abuse the heck out of it and have a ridiculous amount of conditions to evaluate.
re-run pr-centosstream-9-x86_64-py3-pytest
re-run pr-macosx-catalina-x86_64-py3-pytest
re-run pr-macosx-catalina-x86_64-py3-pytest
re-run pr-centos-7-x86_64-py3-tcp-pytest
re-run pr-alma-8-x86_64-py3-pytest
re-run pr-macosx-catalina-x86_64-py3-pytest
I'm starting to think that mac test failure is related because its failed on every test run on this PR
well... only test runs since I moved to config.option. still not sure what's going on here though... every other platform passes this test in under 30 secs. MacOS already has a 15 sec addition onto that limit in the test code, and it keeps taking 300+ sec in recent runs.
re-run pr-windows-2019-x64-py3-pytest
@MKLeb @garethgreenaway , should #62901 have resolved the pr-macosx-catalina-x86_64-py3-pytest failure I'm still seeing here?
@nicholasmhughes Doesn't look like the test that failed was the ones addressed in that referenced PR.
@nicholasmhughes The mac test is not one of the ones fixed by my PR from last week. I have seen that specific test fail a couple times before, but I can't find an example of it. However, it does seem to be failing in a rather profound fashion (almost 8 times the expected duration), which has me inclined to agree with @Ch3LL 's comment. Going to re-run it here and see what happens.
I just don't get what's so different about MacOS that it's taking that long while every other OS takes under 30 seconds...
@nicholasmhughes Are you able to test this on a mac? I'm wondering if debug statements in your code changes will show this test touching your changes unexpectedly. If you don't have a mac, I can try to test it out.
@MKLeb nope. I don't have a Mac.
Mac tests should be fixed since this is closed: https://github.com/saltstack/salt/issues/62829 Will update the branch.
Can you resolve the merge conflict
@nicholasmhughes Okay, so I've found what is causing the drastic failures. The test in question is actually touching your changes once (and seemingly only once). On average, with my testing machine, the method you added is taking ~170ms. Specifically, the self.functions["config.option"] call is taking greater than 95% of that time. If I take your change out, it returns to passing. It seems the rest of the state call takes around 20ms (i.e. under the threshold of the test). Ideally, we'd like to find a way to have your changes coexist in a way which doesn't exceed the remaining 25 seconds of the threshold.
@MKLeb , we suspected config.option as the problem:
well... only test runs since I moved to
config.option. still not sure what's going on here though... every other platform passes this test in under 30 secs. MacOS already has a 15 sec addition onto that limit in the test code, and it keeps taking 300+ sec in recent runs.
Not sure what your references to millisecond return times have to do with the "spawning platform" tests that are failing for MacOS, but I still don't understand what in particular is happening inside config.option only on Macs that is adding that much more time (230-300 sec) that it's failing to return within the 45 second threshold while every other platform returns in under 30 seconds.
Were you able to narrow the "expensive" operations within config.option so we might have an idea as to what's happening on Mac?
Ok... So I don't know precisely what is going on, but I do know one thing for sure - Loading config.option, or any config.<method>, in the separate process spun up by call_parallel takes an aggressively long time. I pinpointed it right to the line that runs load_module. I wasn't sure how to solve that, so I went with a different approach. Factoring your call to _match_global_state_conditions in _call_parallel_target out to the call method like how you call the other one allows the test to pass. Also, I ran your tests with my change and they passed as well. I can make a suggestion here so it's more clear.
EDIT: While I'm not sure this is a complete fix, I would recommend writing some tests for when parallel: True and there are some global state conditions to be matched.
EDIT 2: I realized I never answered your question about why this is only affecting MacOS. The short answer is that I don't know for sure. There is no logic in config.option that targets specifically spawning platforms, nor MacOS. Like I said, it's taking long to load the module funcs, and I saw nothing down the stack or in the loading that targeted MacOS in any way either.
Made a change that was close to what you proposed. Just moved the call to _match_global_state_conditions outside of the nested if since the refactor would've put it in both conditions. Hopefully the tests all run clean now...
Any reasons why this was merged with no discussion, when all the comments on the feature request were against it?
The original implementation outlined in the feature request was not put in place. This PR is a broader feature that was discussed in a couple Open Hours.
Ah, ok.
Open Hour discussions should be captured properly. I don't even recall any mention of this in the minutes,