mgr: Add one finisher thread per module
Adds a finisher thread for each module. Each module commands are executed via corresponding finisher thread. There is generic finisher thread via which all other commands like notify, config change and yet all is run.
Fixes: https://tracker.ceph.com/issues/51177 Signed-off-by: Kotresh HR [email protected]
Contribution Guidelines
-
To sign and title your commits, please refer to Submitting Patches to Ceph.
-
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
- Tracker (select at least one)
- [x] References tracker ticket
- [ ] Very recent bug; references commit where it was introduced
- [ ] New feature (ticket optional)
- [ ] Doc update (no ticket needed)
- [ ] Code cleanup (no ticket needed)
- Component impact
- [ ] Affects Dashboard, opened tracker ticket
- [ ] Affects Orchestrator, opened tracker ticket
- [x] No impact that needs to be tracked
- Documentation (select at least one)
- [ ] Updates relevant documentation
- [x] No doc update is appropriate
- Tests (select at least one)
- [ ] Includes unit test(s)
- [ ] Includes integration test(s)
- [ ] Includes bug reproducer
- [x] No tests
Show available Jenkins commands
-
jenkins retest this please -
jenkins test classic perf -
jenkins test crimson perf -
jenkins test signed -
jenkins test make check -
jenkins test make check arm64 -
jenkins test submodules -
jenkins test dashboard -
jenkins test dashboard cephadm -
jenkins test api -
jenkins test docs -
jenkins render docs -
jenkins test ceph-volume all -
jenkins test ceph-volume tox -
jenkins test windows
@kotreshhr @batrick I'm not against the approach taken, however, I would like to check if the point made here(*) is worth considering?
(*): https://tracker.ceph.com/issues/51177#note-4
@kotreshhr @batrick I'm not against the approach taken, however, I would like to check if the point made here(*) is worth considering?
(*): https://tracker.ceph.com/issues/51177#note-4
I think the issue with that approach is you'd still need to somehow queue other events like mdsmap/osdmap notifications or even subsequent CLI commands. How many async commands should run at once? Leave it to the modules to decide? It gets messy quickly, especially with ordering guarantees.
This approach keeps the sequential nature of commands/events but prevents one module from blocking others from operating. I think it's a nice compromise.
@kotreshhr @batrick I'm not against the approach taken, however, I would like to check if the point made here() is worth considering? (): https://tracker.ceph.com/issues/51177#note-4
I think the issue with that approach is you'd still need to somehow queue other events like mdsmap/osdmap notifications or even subsequent CLI commands. How many async commands should run at once? Leave it to the modules to decide? It gets messy quickly, especially with ordering guarantees.
This approach keeps the sequential nature of commands/events but prevents one module from blocking others from operating. I think it's a nice compromise.
I'm a tiny bit worried about bloating up ceph-mgr's memory usage with this approach. It solves the issue in hand, however, we are adding up a finite memory (queue, locks, etc..) for each mgr module. IIRC, there was some discussion (somewhere) about trimming ceph-mgr's memory footprint.
@badone ^^
I'm a tiny bit worried about bloating up ceph-mgr's memory usage with this approach. It solves the issue in hand, however, we are adding up a finite memory (queue, locks, etc..) for each mgr module. IIRC, there was some discussion (somewhere) about trimming ceph-mgr's memory footprint.
@badone ^^
I don't recall. Are you sure it was me involved (did you mean Boris maybe)?
I'm a tiny bit worried about bloating up ceph-mgr's memory usage with this approach. It solves the issue in hand, however, we are adding up a finite memory (queue, locks, etc..) for each mgr module. IIRC, there was some discussion (somewhere) about trimming ceph-mgr's memory footprint. @badone ^^
I don't recall. Are you sure it was me involved (did you mean Boris maybe)?
Maybe, I'm not sure. It was a while back...
Anyway, we need more eyes on this change.
@vshankar Do you see it adding significantly to the memory usage of the manager? Seems to me the additional overhead of having a finisher for each module plus a generic one shouldn't be massively different to having just a generic one since the content of the combined finisher queues should be about the same size as the queue of the finisher in the one finisher approach right?
@vshankar Do you see it adding significantly to the memory usage of the manager? Seems to me the additional overhead of having a finisher for each module plus a generic one shouldn't be massively different to having just a generic one since the content of the combined finisher queues should be about the same size as the queue of the finisher in the one finisher approach right?
There would be one thread per mgr plugin. That's different that having one thread which is the case now. And then, there is a (finisher) thread even for modules that are not enabled (which can be worked around). But still...
How many plugins do we have now in ceph-mgr?
In the mid-to-high thirties if my quick count is correct but they are not all enabled/active of course.
@vshankar Do you see it adding significantly to the memory usage of the manager? Seems to me the additional overhead of having a finisher for each module plus a generic one shouldn't be massively different to having just a generic one since the content of the combined finisher queues should be about the same size as the queue of the finisher in the one finisher approach right?
Yes, there is nothing changed w.r.t queue size. With this approach, the queue size is distributed among multiple threads which was with the single thread before. The added memory usage is the new Finisher object per new thread. I am not very sure yet how much that would add up to resource consumption.
Ran fs and rados:mgr suite.
http://pulpito.front.sepia.ceph.com/khiremat-2022-09-02_09:35:13-fs-wip-khiremat-47893-test-mgr-finisher-thread-0-distro-default-smithi/ http://pulpito.front.sepia.ceph.com/khiremat-2022-09-06_11:22:19-rados:mgr-wip-khiremat-47893-test-mgr-finisher-thread-0-distro-default-smithi/
Entire rados suite scheduled: http://pulpito.front.sepia.ceph.com/khiremat-2022-09-12_10:34:08-rados-wip-khiremat-47893-test-mgr-finisher-thread-0-distro-default-smithi/
I guess this change is fine since noone has shown objection on the approach taken here.
@badone - @kotreshhr has run additionally this through rados:mgr. Would that be enough as a source of validation?
ping?
I guess this change is fine since noone has shown objection on the approach taken here.
@badone - @kotreshhr has run additionally this through rados:mgr. Would that be enough as a source of validation?
ping?
I guess this change is fine since noone has shown objection on the approach taken here. @badone - @kotreshhr has run additionally this through rados:mgr. Would that be enough as a source of validation?
ping?
Have Patrick's concerns been addressed? It's still showing a requested change.
I guess this change is fine since noone has shown objection on the approach taken here. @badone - @kotreshhr has run additionally this through rados:mgr. Would that be enough as a source of validation?
ping?
Have Patrick's concerns been addressed? It's still showing a requested change.
@badone I had missed noticing Patrick comment. I am working on it. I will refresh the PR once done. Are there any additional testing that you think which needs to be run ?
@kotreshhr I think you can just run the same suites again on our updated PR and that should be sufficient.
jenkins test windows
Scheduled fs and rados suite:
http://pulpito.front.sepia.ceph.com/khiremat-2022-10-18_04:28:21-fs-wip-khiremat-47893-test-mgr-finisher-thread-1-distro-default-smithi/
http://pulpito.front.sepia.ceph.com/khiremat-2022-10-18_04:14:28-rados-wip-khiremat-47893-test-mgr-finisher-thread-1-distro-default-smithi/
Scheduled fs and rados suite:
http://pulpito.front.sepia.ceph.com/khiremat-2022-10-18_04:28:21-fs-wip-khiremat-47893-test-mgr-finisher-thread-1-distro-default-smithi/
Do we need to run the entire fs suite? Wouldn't mgr/{volumes,snap_schedule,mirroring,status,mds_autoscaler} tests suffice?
Scheduled fs and rados suite: http://pulpito.front.sepia.ceph.com/khiremat-2022-10-18_04:28:21-fs-wip-khiremat-47893-test-mgr-finisher-thread-1-distro-default-smithi/
Do we need to run the entire fs suite? Wouldn't mgr/{volumes,snap_schedule,mirroring,status,mds_autoscaler} tests suffice?
Wanted to run the full fs suite as the changes were in ceph-mgr code and not the plugin. The usage of ceph commands in fs suite might give it some extra coverage ?
Scheduled fs and rados suite: http://pulpito.front.sepia.ceph.com/khiremat-2022-10-18_04:28:21-fs-wip-khiremat-47893-test-mgr-finisher-thread-1-distro-default-smithi/
Do we need to run the entire fs suite? Wouldn't mgr/{volumes,snap_schedule,mirroring,status,mds_autoscaler} tests suffice?
Wanted to run the full fs suite as the changes were in ceph-mgr code and not the plugin. The usage of ceph commands in fs suite might give it some extra coverage ?
ACK -- do share how the tests go.
- I think we also need to update
ActivePyModules::notify_alland others to use the new per-module finisher.
@batrick This adds in concurrency between module notifications and other cmds. Previously all the module notifications would get executed in a series and no other cmds in would get executed in between until all the notifications are done. With the change, a few new cmds of some modules might go through before all the module notifications are completed. Is this not a problem ? The same is the case for config notify.
- I think we also need to update
ActivePyModules::notify_alland others to use the new per-module finisher.@batrick This adds in concurrency between module notifications and other cmds. Previously all the module notifications would get executed in a series and no other cmds in would get executed in between until all the notifications are done. With the change, a few new cmds of some modules might go through before all the module notifications are completed. Is this not a problem ? The same is the case for config notify.
I think that's acceptable. We don't guarantee anywhere that all modules will process notifications in lockstep.
Pending items
- [x] Add basic QA tests with qa/tasks/check_counter.py to lookat the per-module finisher stats. Should be in rados and fs suites to start.
jenkins test make check
jenkins test make check arm64
jenkins retest this please
jenkins retest this please
Fixed the test and triggered teuthology job:
http://pulpito.front.sepia.ceph.com/khiremat-2022-12-02_11:12:21-fs:volumes-wip-khiremat-47893-test-mgr-finisher-thread-5-distro-default-smithi/