ceph mgr: Add one finisher thread per module

Adds a finisher thread for each module. Each module commands are executed via corresponding finisher thread. There is generic finisher thread via which all other commands like notify, config change and yet all is run.

Fixes: https://tracker.ceph.com/issues/51177 Signed-off-by: Kotresh HR [email protected]

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

Checklist

Tracker (select at least one)
- [x] References tracker ticket
- [ ] Very recent bug; references commit where it was introduced
- [ ] New feature (ticket optional)
- [ ] Doc update (no ticket needed)
- [ ] Code cleanup (no ticket needed)
Component impact
- [ ] Affects Dashboard, opened tracker ticket
- [ ] Affects Orchestrator, opened tracker ticket
- [x] No impact that needs to be tracked
Documentation (select at least one)
- [ ] Updates relevant documentation
- [x] No doc update is appropriate
Tests (select at least one)
- [ ] Includes unit test(s)
- [ ] Includes integration test(s)
- [ ] Includes bug reproducer
- [x] No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows

Sep 01 '22 06:09 kotreshhr

@kotreshhr @batrick I'm not against the approach taken, however, I would like to check if the point made here(*) is worth considering?

(*): https://tracker.ceph.com/issues/51177#note-4

Sep 07 '22 05:09 vshankar

@kotreshhr @batrick I'm not against the approach taken, however, I would like to check if the point made here(*) is worth considering?

(*): https://tracker.ceph.com/issues/51177#note-4

I think the issue with that approach is you'd still need to somehow queue other events like mdsmap/osdmap notifications or even subsequent CLI commands. How many async commands should run at once? Leave it to the modules to decide? It gets messy quickly, especially with ordering guarantees.

This approach keeps the sequential nature of commands/events but prevents one module from blocking others from operating. I think it's a nice compromise.

Sep 07 '22 15:09 batrick

@kotreshhr @batrick I'm not against the approach taken, however, I would like to check if the point made here() is worth considering? (): https://tracker.ceph.com/issues/51177#note-4

I think the issue with that approach is you'd still need to somehow queue other events like mdsmap/osdmap notifications or even subsequent CLI commands. How many async commands should run at once? Leave it to the modules to decide? It gets messy quickly, especially with ordering guarantees.

This approach keeps the sequential nature of commands/events but prevents one module from blocking others from operating. I think it's a nice compromise.

I'm a tiny bit worried about bloating up ceph-mgr's memory usage with this approach. It solves the issue in hand, however, we are adding up a finite memory (queue, locks, etc..) for each mgr module. IIRC, there was some discussion (somewhere) about trimming ceph-mgr's memory footprint.

@badone ^^

Sep 08 '22 12:09 vshankar

I'm a tiny bit worried about bloating up ceph-mgr's memory usage with this approach. It solves the issue in hand, however, we are adding up a finite memory (queue, locks, etc..) for each mgr module. IIRC, there was some discussion (somewhere) about trimming ceph-mgr's memory footprint.

@badone ^^

I don't recall. Are you sure it was me involved (did you mean Boris maybe)?

Sep 08 '22 23:09 badone

I'm a tiny bit worried about bloating up ceph-mgr's memory usage with this approach. It solves the issue in hand, however, we are adding up a finite memory (queue, locks, etc..) for each mgr module. IIRC, there was some discussion (somewhere) about trimming ceph-mgr's memory footprint. @badone ^^

I don't recall. Are you sure it was me involved (did you mean Boris maybe)?

Maybe, I'm not sure. It was a while back...

Anyway, we need more eyes on this change.

Sep 09 '22 01:09 vshankar

@vshankar Do you see it adding significantly to the memory usage of the manager? Seems to me the additional overhead of having a finisher for each module plus a generic one shouldn't be massively different to having just a generic one since the content of the combined finisher queues should be about the same size as the queue of the finisher in the one finisher approach right?

Sep 09 '22 02:09 badone

@vshankar Do you see it adding significantly to the memory usage of the manager? Seems to me the additional overhead of having a finisher for each module plus a generic one shouldn't be massively different to having just a generic one since the content of the combined finisher queues should be about the same size as the queue of the finisher in the one finisher approach right?

There would be one thread per mgr plugin. That's different that having one thread which is the case now. And then, there is a (finisher) thread even for modules that are not enabled (which can be worked around). But still...

How many plugins do we have now in ceph-mgr?

Sep 09 '22 04:09 vshankar

In the mid-to-high thirties if my quick count is correct but they are not all enabled/active of course.

Sep 09 '22 04:09 badone

@vshankar Do you see it adding significantly to the memory usage of the manager? Seems to me the additional overhead of having a finisher for each module plus a generic one shouldn't be massively different to having just a generic one since the content of the combined finisher queues should be about the same size as the queue of the finisher in the one finisher approach right?

Yes, there is nothing changed w.r.t queue size. With this approach, the queue size is distributed among multiple threads which was with the single thread before. The added memory usage is the new Finisher object per new thread. I am not very sure yet how much that would add up to resource consumption.

Sep 09 '22 07:09 kotreshhr

Ran fs and rados:mgr suite.

http://pulpito.front.sepia.ceph.com/khiremat-2022-09-02_09:35:13-fs-wip-khiremat-47893-test-mgr-finisher-thread-0-distro-default-smithi/ http://pulpito.front.sepia.ceph.com/khiremat-2022-09-06_11:22:19-rados:mgr-wip-khiremat-47893-test-mgr-finisher-thread-0-distro-default-smithi/

Sep 12 '22 07:09 kotreshhr

Entire rados suite scheduled: http://pulpito.front.sepia.ceph.com/khiremat-2022-09-12_10:34:08-rados-wip-khiremat-47893-test-mgr-finisher-thread-0-distro-default-smithi/

Sep 12 '22 12:09 kotreshhr

I guess this change is fine since noone has shown objection on the approach taken here.

@badone - @kotreshhr has run additionally this through rados:mgr. Would that be enough as a source of validation?

Sep 16 '22 10:09 vshankar

ping?

Sep 21 '22 13:09 vshankar

I guess this change is fine since noone has shown objection on the approach taken here.

@badone - @kotreshhr has run additionally this through rados:mgr. Would that be enough as a source of validation?

ping?

Oct 10 '22 07:10 vshankar

I guess this change is fine since noone has shown objection on the approach taken here. @badone - @kotreshhr has run additionally this through rados:mgr. Would that be enough as a source of validation?

ping?

Have Patrick's concerns been addressed? It's still showing a requested change.

Oct 12 '22 21:10 badone

I guess this change is fine since noone has shown objection on the approach taken here. @badone - @kotreshhr has run additionally this through rados:mgr. Would that be enough as a source of validation?

ping?

Have Patrick's concerns been addressed? It's still showing a requested change.

@badone I had missed noticing Patrick comment. I am working on it. I will refresh the PR once done. Are there any additional testing that you think which needs to be run ?

Oct 13 '22 06:10 kotreshhr

@kotreshhr I think you can just run the same suites again on our updated PR and that should be sufficient.

Oct 17 '22 03:10 badone

jenkins test windows

Oct 17 '22 12:10 kotreshhr

Scheduled fs and rados suite:

http://pulpito.front.sepia.ceph.com/khiremat-2022-10-18_04:28:21-fs-wip-khiremat-47893-test-mgr-finisher-thread-1-distro-default-smithi/

http://pulpito.front.sepia.ceph.com/khiremat-2022-10-18_04:14:28-rados-wip-khiremat-47893-test-mgr-finisher-thread-1-distro-default-smithi/

Oct 18 '22 04:10 kotreshhr

Scheduled fs and rados suite:

http://pulpito.front.sepia.ceph.com/khiremat-2022-10-18_04:28:21-fs-wip-khiremat-47893-test-mgr-finisher-thread-1-distro-default-smithi/

Do we need to run the entire fs suite? Wouldn't mgr/{volumes,snap_schedule,mirroring,status,mds_autoscaler} tests suffice?

Oct 18 '22 05:10 vshankar

Scheduled fs and rados suite: http://pulpito.front.sepia.ceph.com/khiremat-2022-10-18_04:28:21-fs-wip-khiremat-47893-test-mgr-finisher-thread-1-distro-default-smithi/

Do we need to run the entire fs suite? Wouldn't mgr/{volumes,snap_schedule,mirroring,status,mds_autoscaler} tests suffice?

Wanted to run the full fs suite as the changes were in ceph-mgr code and not the plugin. The usage of ceph commands in fs suite might give it some extra coverage ?

Oct 18 '22 07:10 kotreshhr

Scheduled fs and rados suite: http://pulpito.front.sepia.ceph.com/khiremat-2022-10-18_04:28:21-fs-wip-khiremat-47893-test-mgr-finisher-thread-1-distro-default-smithi/

Do we need to run the entire fs suite? Wouldn't mgr/{volumes,snap_schedule,mirroring,status,mds_autoscaler} tests suffice?

Wanted to run the full fs suite as the changes were in ceph-mgr code and not the plugin. The usage of ceph commands in fs suite might give it some extra coverage ?

ACK -- do share how the tests go.

Oct 20 '22 12:10 vshankar

I think we also need to update ActivePyModules::notify_all and others to use the new per-module finisher.

@batrick This adds in concurrency between module notifications and other cmds. Previously all the module notifications would get executed in a series and no other cmds in would get executed in between until all the notifications are done. With the change, a few new cmds of some modules might go through before all the module notifications are completed. Is this not a problem ? The same is the case for config notify.

Oct 27 '22 10:10 kotreshhr

I think we also need to update ActivePyModules::notify_all and others to use the new per-module finisher.

@batrick This adds in concurrency between module notifications and other cmds. Previously all the module notifications would get executed in a series and no other cmds in would get executed in between until all the notifications are done. With the change, a few new cmds of some modules might go through before all the module notifications are completed. Is this not a problem ? The same is the case for config notify.

I think that's acceptable. We don't guarantee anywhere that all modules will process notifications in lockstep.

Oct 27 '22 15:10 batrick

Pending items

[x] Add basic QA tests with qa/tasks/check_counter.py to lookat the per-module finisher stats. Should be in rados and fs suites to start.

Oct 28 '22 11:10 kotreshhr

jenkins test make check

Oct 28 '22 16:10 kotreshhr

jenkins test make check arm64

Oct 28 '22 17:10 kotreshhr

jenkins retest this please

Oct 28 '22 17:10 kotreshhr

jenkins retest this please

Nov 07 '22 08:11 kotreshhr

Fixed the test and triggered teuthology job:

http://pulpito.front.sepia.ceph.com/khiremat-2022-12-02_11:12:21-fs:volumes-wip-khiremat-47893-test-mgr-finisher-thread-5-distro-default-smithi/

Dec 02 '22 11:12 kotreshhr