ceph icon indicating copy to clipboard operation
ceph copied to clipboard

mgr: Add one finisher thread per module

Open kotreshhr opened this issue 3 years ago • 21 comments

Adds a finisher thread for each module. Each module commands are executed via corresponding finisher thread. There is generic finisher thread via which all other commands like notify, config change and yet all is run.

Fixes: https://tracker.ceph.com/issues/51177 Signed-off-by: Kotresh HR [email protected]

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • [x] References tracker ticket
    • [ ] Very recent bug; references commit where it was introduced
    • [ ] New feature (ticket optional)
    • [ ] Doc update (no ticket needed)
    • [ ] Code cleanup (no ticket needed)
  • Component impact
    • [ ] Affects Dashboard, opened tracker ticket
    • [ ] Affects Orchestrator, opened tracker ticket
    • [x] No impact that needs to be tracked
  • Documentation (select at least one)
    • [ ] Updates relevant documentation
    • [x] No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

kotreshhr avatar Sep 01 '22 06:09 kotreshhr

@kotreshhr @batrick I'm not against the approach taken, however, I would like to check if the point made here(*) is worth considering?

(*): https://tracker.ceph.com/issues/51177#note-4

vshankar avatar Sep 07 '22 05:09 vshankar

@kotreshhr @batrick I'm not against the approach taken, however, I would like to check if the point made here(*) is worth considering?

(*): https://tracker.ceph.com/issues/51177#note-4

I think the issue with that approach is you'd still need to somehow queue other events like mdsmap/osdmap notifications or even subsequent CLI commands. How many async commands should run at once? Leave it to the modules to decide? It gets messy quickly, especially with ordering guarantees.

This approach keeps the sequential nature of commands/events but prevents one module from blocking others from operating. I think it's a nice compromise.

batrick avatar Sep 07 '22 15:09 batrick

@kotreshhr @batrick I'm not against the approach taken, however, I would like to check if the point made here() is worth considering? (): https://tracker.ceph.com/issues/51177#note-4

I think the issue with that approach is you'd still need to somehow queue other events like mdsmap/osdmap notifications or even subsequent CLI commands. How many async commands should run at once? Leave it to the modules to decide? It gets messy quickly, especially with ordering guarantees.

This approach keeps the sequential nature of commands/events but prevents one module from blocking others from operating. I think it's a nice compromise.

I'm a tiny bit worried about bloating up ceph-mgr's memory usage with this approach. It solves the issue in hand, however, we are adding up a finite memory (queue, locks, etc..) for each mgr module. IIRC, there was some discussion (somewhere) about trimming ceph-mgr's memory footprint.

@badone ^^

vshankar avatar Sep 08 '22 12:09 vshankar

I'm a tiny bit worried about bloating up ceph-mgr's memory usage with this approach. It solves the issue in hand, however, we are adding up a finite memory (queue, locks, etc..) for each mgr module. IIRC, there was some discussion (somewhere) about trimming ceph-mgr's memory footprint.

@badone ^^

I don't recall. Are you sure it was me involved (did you mean Boris maybe)?

badone avatar Sep 08 '22 23:09 badone

I'm a tiny bit worried about bloating up ceph-mgr's memory usage with this approach. It solves the issue in hand, however, we are adding up a finite memory (queue, locks, etc..) for each mgr module. IIRC, there was some discussion (somewhere) about trimming ceph-mgr's memory footprint. @badone ^^

I don't recall. Are you sure it was me involved (did you mean Boris maybe)?

Maybe, I'm not sure. It was a while back...

Anyway, we need more eyes on this change.

vshankar avatar Sep 09 '22 01:09 vshankar

@vshankar Do you see it adding significantly to the memory usage of the manager? Seems to me the additional overhead of having a finisher for each module plus a generic one shouldn't be massively different to having just a generic one since the content of the combined finisher queues should be about the same size as the queue of the finisher in the one finisher approach right?

badone avatar Sep 09 '22 02:09 badone

@vshankar Do you see it adding significantly to the memory usage of the manager? Seems to me the additional overhead of having a finisher for each module plus a generic one shouldn't be massively different to having just a generic one since the content of the combined finisher queues should be about the same size as the queue of the finisher in the one finisher approach right?

There would be one thread per mgr plugin. That's different that having one thread which is the case now. And then, there is a (finisher) thread even for modules that are not enabled (which can be worked around). But still...

How many plugins do we have now in ceph-mgr?

vshankar avatar Sep 09 '22 04:09 vshankar

In the mid-to-high thirties if my quick count is correct but they are not all enabled/active of course.

badone avatar Sep 09 '22 04:09 badone

@vshankar Do you see it adding significantly to the memory usage of the manager? Seems to me the additional overhead of having a finisher for each module plus a generic one shouldn't be massively different to having just a generic one since the content of the combined finisher queues should be about the same size as the queue of the finisher in the one finisher approach right?

Yes, there is nothing changed w.r.t queue size. With this approach, the queue size is distributed among multiple threads which was with the single thread before. The added memory usage is the new Finisher object per new thread. I am not very sure yet how much that would add up to resource consumption.

kotreshhr avatar Sep 09 '22 07:09 kotreshhr

Ran fs and rados:mgr suite.

http://pulpito.front.sepia.ceph.com/khiremat-2022-09-02_09:35:13-fs-wip-khiremat-47893-test-mgr-finisher-thread-0-distro-default-smithi/ http://pulpito.front.sepia.ceph.com/khiremat-2022-09-06_11:22:19-rados:mgr-wip-khiremat-47893-test-mgr-finisher-thread-0-distro-default-smithi/

kotreshhr avatar Sep 12 '22 07:09 kotreshhr

Entire rados suite scheduled: http://pulpito.front.sepia.ceph.com/khiremat-2022-09-12_10:34:08-rados-wip-khiremat-47893-test-mgr-finisher-thread-0-distro-default-smithi/

kotreshhr avatar Sep 12 '22 12:09 kotreshhr

I guess this change is fine since noone has shown objection on the approach taken here.

@badone - @kotreshhr has run additionally this through rados:mgr. Would that be enough as a source of validation?

vshankar avatar Sep 16 '22 10:09 vshankar

ping?

vshankar avatar Sep 21 '22 13:09 vshankar

I guess this change is fine since noone has shown objection on the approach taken here.

@badone - @kotreshhr has run additionally this through rados:mgr. Would that be enough as a source of validation?

ping?

vshankar avatar Oct 10 '22 07:10 vshankar

I guess this change is fine since noone has shown objection on the approach taken here. @badone - @kotreshhr has run additionally this through rados:mgr. Would that be enough as a source of validation?

ping?

Have Patrick's concerns been addressed? It's still showing a requested change.

badone avatar Oct 12 '22 21:10 badone

I guess this change is fine since noone has shown objection on the approach taken here. @badone - @kotreshhr has run additionally this through rados:mgr. Would that be enough as a source of validation?

ping?

Have Patrick's concerns been addressed? It's still showing a requested change.

@badone I had missed noticing Patrick comment. I am working on it. I will refresh the PR once done. Are there any additional testing that you think which needs to be run ?

kotreshhr avatar Oct 13 '22 06:10 kotreshhr

@kotreshhr I think you can just run the same suites again on our updated PR and that should be sufficient.

badone avatar Oct 17 '22 03:10 badone

jenkins test windows

kotreshhr avatar Oct 17 '22 12:10 kotreshhr

Scheduled fs and rados suite:

http://pulpito.front.sepia.ceph.com/khiremat-2022-10-18_04:28:21-fs-wip-khiremat-47893-test-mgr-finisher-thread-1-distro-default-smithi/

http://pulpito.front.sepia.ceph.com/khiremat-2022-10-18_04:14:28-rados-wip-khiremat-47893-test-mgr-finisher-thread-1-distro-default-smithi/

kotreshhr avatar Oct 18 '22 04:10 kotreshhr

Scheduled fs and rados suite:

http://pulpito.front.sepia.ceph.com/khiremat-2022-10-18_04:28:21-fs-wip-khiremat-47893-test-mgr-finisher-thread-1-distro-default-smithi/

Do we need to run the entire fs suite? Wouldn't mgr/{volumes,snap_schedule,mirroring,status,mds_autoscaler} tests suffice?

vshankar avatar Oct 18 '22 05:10 vshankar

Scheduled fs and rados suite: http://pulpito.front.sepia.ceph.com/khiremat-2022-10-18_04:28:21-fs-wip-khiremat-47893-test-mgr-finisher-thread-1-distro-default-smithi/

Do we need to run the entire fs suite? Wouldn't mgr/{volumes,snap_schedule,mirroring,status,mds_autoscaler} tests suffice?

Wanted to run the full fs suite as the changes were in ceph-mgr code and not the plugin. The usage of ceph commands in fs suite might give it some extra coverage ?

kotreshhr avatar Oct 18 '22 07:10 kotreshhr

Scheduled fs and rados suite: http://pulpito.front.sepia.ceph.com/khiremat-2022-10-18_04:28:21-fs-wip-khiremat-47893-test-mgr-finisher-thread-1-distro-default-smithi/

Do we need to run the entire fs suite? Wouldn't mgr/{volumes,snap_schedule,mirroring,status,mds_autoscaler} tests suffice?

Wanted to run the full fs suite as the changes were in ceph-mgr code and not the plugin. The usage of ceph commands in fs suite might give it some extra coverage ?

ACK -- do share how the tests go.

vshankar avatar Oct 20 '22 12:10 vshankar

  • I think we also need to update ActivePyModules::notify_all and others to use the new per-module finisher.

@batrick This adds in concurrency between module notifications and other cmds. Previously all the module notifications would get executed in a series and no other cmds in would get executed in between until all the notifications are done. With the change, a few new cmds of some modules might go through before all the module notifications are completed. Is this not a problem ? The same is the case for config notify.

kotreshhr avatar Oct 27 '22 10:10 kotreshhr

  • I think we also need to update ActivePyModules::notify_all and others to use the new per-module finisher.

@batrick This adds in concurrency between module notifications and other cmds. Previously all the module notifications would get executed in a series and no other cmds in would get executed in between until all the notifications are done. With the change, a few new cmds of some modules might go through before all the module notifications are completed. Is this not a problem ? The same is the case for config notify.

I think that's acceptable. We don't guarantee anywhere that all modules will process notifications in lockstep.

batrick avatar Oct 27 '22 15:10 batrick

Pending items

  • [x] Add basic QA tests with qa/tasks/check_counter.py to lookat the per-module finisher stats. Should be in rados and fs suites to start.

kotreshhr avatar Oct 28 '22 11:10 kotreshhr

jenkins test make check

kotreshhr avatar Oct 28 '22 16:10 kotreshhr

jenkins test make check arm64

kotreshhr avatar Oct 28 '22 17:10 kotreshhr

jenkins retest this please

kotreshhr avatar Oct 28 '22 17:10 kotreshhr

jenkins retest this please

kotreshhr avatar Nov 07 '22 08:11 kotreshhr

Fixed the test and triggered teuthology job:

http://pulpito.front.sepia.ceph.com/khiremat-2022-12-02_11:12:21-fs:volumes-wip-khiremat-47893-test-mgr-finisher-thread-5-distro-default-smithi/

kotreshhr avatar Dec 02 '22 11:12 kotreshhr