ceph icon indicating copy to clipboard operation
ceph copied to clipboard

debian: recursively adjust permissions of /var/lib/ceph/crash

Open Aequitosh opened this issue 1 year ago • 13 comments

A rather recent PR made ceph-crash run as ceph user instead of root^0. However, because /var/lib/ceph/crash/posted belongs to root, ceph-crash cannot actually post any crash logs now.

This commit fixes this by recursively updating the permissions of /var/lib/ceph/crash, which ensures that all files and directories used by ceph-crash.service are actually owned by the user configured for Ceph.

The previously existing loop has also been replaced by an invocation of find | xargs.

Fixes: https://tracker.ceph.com/issues/64548

Checklist

  • Tracker (select at least one)
    • [x] References tracker ticket
    • [x] Very recent bug; references commit where it was introduced
    • [ ] New feature (ticket optional)
    • [ ] Doc update (no ticket needed)
    • [ ] Code cleanup (no ticket needed)
  • Component impact
    • [ ] Affects Dashboard, opened tracker ticket
    • [ ] Affects Orchestrator, opened tracker ticket
    • [x] No impact that needs to be tracked
  • Documentation (select at least one)
    • [ ] Updates relevant documentation
    • [x] No doc update is appropriate
  • Tests (select at least one)

Further Notes

I'm not 100% sure if this is the best way to fix this issue, so please let me know if you prefer a different solution. This patch is originally from our mailing list^1 - we've decided that it's what works best for us, due to find | xargs being a bit more fine-grained than, let's say, something like for FILE in /var/lib/ceph/**/*; do ...; done.

Also, should this get merged (in whatever form), I'll also gladly provide backports for Reef and Quincy.

Aequitosh avatar Mar 04 '24 10:03 Aequitosh

@tchaikov could this please be backported to at least Reef to be fixed with the next point release?

frittentheke avatar Apr 02 '24 14:04 frittentheke

@tchaikov could this please be backported to at least Reef to be fixed with the next point release?

I can also provide backports for Reef and Quincy if you don't mind - either once this gets merged or whenever you wish.

Aequitosh avatar Apr 03 '24 08:04 Aequitosh

@tchaikov could this please be backported to at least Reef to be fixed with the next point release?

@frittentheke hi Christian, i think so. since https://tracker.ceph.com/issues/64548 includes Reef in its "Backport" field.

and sorry for the latency, gentlemen. i will try to squeeze more time for a proper review asap.

@tchaikov could this please be backported to at least Reef to be fixed with the next point release?

I can also provide backports for Reef and Quincy if you don't mind - either once this gets merged or whenever you wish.

better off waiting until this PR gets merged, so that we can reference it with the correct sha1 in the backport PR.

tchaikov avatar Apr 03 '24 08:04 tchaikov

@frittentheke hi Christian, i think so. since https://tracker.ceph.com/issues/64548 includes Reef in its "Backport" field. and sorry for the latency, gentlemen. i will try to squeeze more time for a proper review asap.

Thanks @tchaikov for the quick update!

While we certainly adjusted this on our clusters already, I am just really afraid this causes less crash reports to come in from other installs out there. Less crash reports, means potentially more undiscovered bugs. Selfish me :see_no_evil:

frittentheke avatar Apr 03 '24 08:04 frittentheke

@tchaikov sorry to nag about this again, but I just noticed 18.2.3 QA phase already started (https://lists.ceph.io/hyperkitty/list/[email protected]/thread/KRC33HNRBCZC5Z2GUDN5DQO43I2RJBJJ/).

This PR is "just" about fixing some file permissions and we are likely discussion implementations styles. But FWIW, I left a few lines of review myself.

frittentheke avatar Apr 14 '24 18:04 frittentheke

@tchaikov sorry to nag about this again, but I just noticed 18.2.3 QA phase already started (https://lists.ceph.io/hyperkitty/list/[email protected]/thread/KRC33HNRBCZC5Z2GUDN5DQO43I2RJBJJ/).

This PR is "just" about fixing some file permissions and we are likely discussion implementations styles. But FWIW, I left a few lines of review myself.

@frittentheke my apologies. indeed.

tchaikov avatar Apr 15 '24 02:04 tchaikov

@frittentheke my apologies. indeed.

No need @tchaikov ... I am still just a selfish dude looking for other installations' crash reports to make my Ceph installations more stable ;-)

Speaking of Reef ... any remote chance this change can still make it into 18.2.3 ?

frittentheke avatar Apr 16 '24 08:04 frittentheke

I should actually add the milestone to the Reef backport when it's ready. Instead, I've added the "needs-reef-backport" label.

ljflores avatar Apr 18 '24 20:04 ljflores

@frittentheke @tchaikov Thanks for the reviews, much appreciated! I left some more comments inline.

What's now obvious to me is that the original loop can just stay - the find+xargs replacement isn't necessary.

However, I'd personally suggest to do something like chown -R /var/lib/ceph/crash/*, as some entries in /var/lib/ceph/crash/posted might still belong to root e.g. - entries that were posted prior to #48713.

I'll update the PR accordingly, if you don't object.

Aequitosh avatar Apr 22 '24 06:04 Aequitosh

However, I'd personally suggest to do something like chown -R /var/lib/ceph/crash/*, as some entries in /var/lib/ceph/crash/posted might still belong to root e.g. - entries that were posted prior to #48713.

I'll update the PR accordingly, if you don't object.

FWIW, I like the idea to "ensure" (and potentially fix) ownership also for existing files.

frittentheke avatar Apr 22 '24 07:04 frittentheke

@Aequitosh @tchaikov could I nag you once more to finalize the discussion on how to implement the chown ?

frittentheke avatar Apr 26 '24 13:04 frittentheke

@frittentheke @tchaikov Thanks for the reviews, much appreciated! I left some more comments inline.

What's now obvious to me is that the original loop can just stay - the find+xargs replacement isn't necessary.

However, I'd personally suggest to do something like chown -R /var/lib/ceph/crash/*, as some entries in /var/lib/ceph/crash/posted might still belong to root e.g. - entries that were posted prior to #48713.

I'll update the PR accordingly, if you don't object.

@Aequitosh hi Max, please go ahead. i didn't take the "existing files" use case into consideration. but if that case exists, then, yeah, i concur with you.

tchaikov avatar Apr 26 '24 13:04 tchaikov

Updated the PR. See the commit message for details.

Basically did as I described, but also quoted variables where they're interpolated. Also added curly braces there.

I highly doubt that globbing & word splitting are ever going to be issues, but still wanted to include it for good measure. (Though, should that not be desired, I can quickly update the PR again.)

Aequitosh avatar Apr 30 '24 15:04 Aequitosh

Is this fix on track to be merged?

frittentheke avatar Jun 24 '24 19:06 frittentheke

@frittentheke I'm not sure what happened with the testing. It will be included in the next main batch.

ljflores avatar Jul 03 '24 21:07 ljflores