cgroup-v2 considerations
- prevent possible lockup when format in proc changes
- properly get and handle scheduler policy & prio
- recognize and try to handle cgroup-v2 similarly
- on SCHED_RR failing push to the max with SCHED_OTHER
Just as a preview ... Needs splitting probably. And the cgroup-v2 stuff is ugly:
- scanning /proc/sched_debug seems to be the only easy way to find out about CONFIG_RT_GROUP_SCHED being enabled with cgroup-v2
- currently (as of 5.4.20) there is no hierarchical rt-budget and so moving to the root-slice in all cases with all consequences
- when moving to the root-slice journal stops working
- auto and yes for SBD_MOVE_TO_ROOT_CGROUP are behaving the same
Code-wise it looks reasonable, though I'm not familiar with either cgroup implementation and didn't do any testing. Spelling: "budged" in a couple of places.
It's probably worthwhile to comment, either in the sysconfig file or the code, the conditions under which cgroup v2 will be effective. I.e. what kernel version made it available and what has to be done to switch to it, and how a user could tell what an existing system uses.
It's probably worthwhile to comment, either in the sysconfig file or the code, the conditions under which cgroup v2 will be effective. I.e. what kernel version made it available and what has to be done to switch to it, and how a user could tell what an existing system uses.
Tried to be a bit more descriptive in the comment before the code that is actually doing the check. As it is there for a while in the kernel and both can be configured I guess going into kernel-versions that would provide some version of cgroup-v2 doesn't make much sense. Fedora 31 seems to be the first distribution using cgroup-v2 by default and although it should be possible I didn't play with switching back and forth. Asking for trouble probably. Effort here is more to live with it if it is there. Even with cgroup-v2 enabled in as in Fedora 31 up to now approaches shouldn't run into issues as long as CONFIG_RT_GROUP_SCHED isn't enabled as moving to root-slice is not needed. Both sbd and corosync will first check for non existent /sys/fs/cgroup/cpu/cpu.rt_runtime_us and be happy. To play with, an otherwise Fedora 31 kernel with CONFIG_RT_GROUP_SCHED enabled can be found under https://koji.fedoraproject.org/koji/taskinfo?taskID=41654832 (don't know when it would be cleaned up).
Looks reasonable (a bit scary tho) but I have a question. What you mean by "when moving to the root-slice journal stops working"? It's logging to journald or some other journal (sbd, fs, ...)?
Looks reasonable (a bit scary tho) but I have a question. What you mean by "when moving to the root-slice journal stops working"? It's logging to journald or some sbd journal?
logging stops to work unfortunately. If it was something sbd internal I would have tried to make it work ;-) no idea if it is just that (bad enough but we would have logging in a file as well) or if there are other issues. Anyway stopping via the cgroup is probably not working with all that root-slice switching - which is why I try to prevent it whenever possible.
Looks reasonable (a bit scary tho) but I have a question. What you mean by "when moving to the root-slice journal stops working"? It's logging to journald or some sbd journal?
logging stops to work unfortunately. If it was something sbd internal I would have tried to make it work ;-) no idea if it is just that (bad enough but we would have logging in a file as well) or if there are other issues. Anyway stopping via the cgroup is probably not working with all that root-slice switching - which is why I try to prevent it whenever possible.
Ok, thanks for the info.
cherry-picked the travis-config changes needed for mock 2.0 (update in fedora-31) as they are not really related to the topic of this PR. Split off the scheduler-config stuff that isn't actually cgroup-v2 related. Guess it should be OK to cherry-pick that into master as well as it should fix a possible hang-situation when /proc-content changes with some kernel-version & it makes behavior more similar with what corosync is doing (fall back to raising prio to the max within SCHED_OTHER if switch to SCHED_RR is failing).