debian: recursively adjust permissions of /var/lib/ceph/crash
A rather recent PR made ceph-crash run as ceph user instead of root^0. However, because /var/lib/ceph/crash/posted belongs to root, ceph-crash cannot actually post any crash logs now.
This commit fixes this by recursively updating the permissions of /var/lib/ceph/crash, which ensures that all files and directories used by ceph-crash.service are actually owned by the user configured for Ceph.
The previously existing loop has also been replaced by an invocation of find | xargs.
Fixes: https://tracker.ceph.com/issues/64548
Checklist
- Tracker (select at least one)
- [x] References tracker ticket
- [x] Very recent bug; references commit where it was introduced
- [ ] New feature (ticket optional)
- [ ] Doc update (no ticket needed)
- [ ] Code cleanup (no ticket needed)
- Component impact
- [ ] Affects Dashboard, opened tracker ticket
- [ ] Affects Orchestrator, opened tracker ticket
- [x] No impact that needs to be tracked
- Documentation (select at least one)
- [ ] Updates relevant documentation
- [x] No doc update is appropriate
- Tests (select at least one)
- [ ] Includes unit test(s)
- [ ] Includes integration test(s)
- [ ] Includes bug reproducer
- [x] No tests
Further Notes
I'm not 100% sure if this is the best way to fix this issue, so please let me know if you prefer a different solution. This patch is originally from our mailing list^1 - we've decided that it's what works best for us, due to find | xargs being a bit more fine-grained than, let's say, something like for FILE in /var/lib/ceph/**/*; do ...; done.
Also, should this get merged (in whatever form), I'll also gladly provide backports for Reef and Quincy.
@tchaikov could this please be backported to at least Reef to be fixed with the next point release?
@tchaikov could this please be backported to at least Reef to be fixed with the next point release?
I can also provide backports for Reef and Quincy if you don't mind - either once this gets merged or whenever you wish.
@tchaikov could this please be backported to at least Reef to be fixed with the next point release?
@frittentheke hi Christian, i think so. since https://tracker.ceph.com/issues/64548 includes Reef in its "Backport" field.
and sorry for the latency, gentlemen. i will try to squeeze more time for a proper review asap.
@tchaikov could this please be backported to at least Reef to be fixed with the next point release?
I can also provide backports for Reef and Quincy if you don't mind - either once this gets merged or whenever you wish.
better off waiting until this PR gets merged, so that we can reference it with the correct sha1 in the backport PR.
@frittentheke hi Christian, i think so. since https://tracker.ceph.com/issues/64548 includes Reef in its "Backport" field. and sorry for the latency, gentlemen. i will try to squeeze more time for a proper review asap.
Thanks @tchaikov for the quick update!
While we certainly adjusted this on our clusters already, I am just really afraid this causes less crash reports to come in from other installs out there. Less crash reports, means potentially more undiscovered bugs. Selfish me :see_no_evil:
@tchaikov sorry to nag about this again, but I just noticed 18.2.3 QA phase already started (https://lists.ceph.io/hyperkitty/list/[email protected]/thread/KRC33HNRBCZC5Z2GUDN5DQO43I2RJBJJ/).
This PR is "just" about fixing some file permissions and we are likely discussion implementations styles. But FWIW, I left a few lines of review myself.
@tchaikov sorry to nag about this again, but I just noticed 18.2.3 QA phase already started (https://lists.ceph.io/hyperkitty/list/[email protected]/thread/KRC33HNRBCZC5Z2GUDN5DQO43I2RJBJJ/).
This PR is "just" about fixing some file permissions and we are likely discussion implementations styles. But FWIW, I left a few lines of review myself.
@frittentheke my apologies. indeed.
@frittentheke my apologies. indeed.
No need @tchaikov ... I am still just a selfish dude looking for other installations' crash reports to make my Ceph installations more stable ;-)
Speaking of Reef ... any remote chance this change can still make it into 18.2.3 ?
I should actually add the milestone to the Reef backport when it's ready. Instead, I've added the "needs-reef-backport" label.
@frittentheke @tchaikov Thanks for the reviews, much appreciated! I left some more comments inline.
What's now obvious to me is that the original loop can just stay - the find+xargs replacement isn't necessary.
However, I'd personally suggest to do something like chown -R /var/lib/ceph/crash/*, as some entries in /var/lib/ceph/crash/posted might still belong to root e.g. - entries that were posted prior to #48713.
I'll update the PR accordingly, if you don't object.
However, I'd personally suggest to do something like
chown -R /var/lib/ceph/crash/*, as some entries in/var/lib/ceph/crash/postedmight still belong toroote.g. - entries that were posted prior to #48713.I'll update the PR accordingly, if you don't object.
FWIW, I like the idea to "ensure" (and potentially fix) ownership also for existing files.
@Aequitosh @tchaikov could I nag you once more to finalize the discussion on how to implement the chown ?
@frittentheke @tchaikov Thanks for the reviews, much appreciated! I left some more comments inline.
What's now obvious to me is that the original loop can just stay - the
find+xargsreplacement isn't necessary.However, I'd personally suggest to do something like
chown -R /var/lib/ceph/crash/*, as some entries in/var/lib/ceph/crash/postedmight still belong toroote.g. - entries that were posted prior to #48713.I'll update the PR accordingly, if you don't object.
@Aequitosh hi Max, please go ahead. i didn't take the "existing files" use case into consideration. but if that case exists, then, yeah, i concur with you.
Updated the PR. See the commit message for details.
Basically did as I described, but also quoted variables where they're interpolated. Also added curly braces there.
I highly doubt that globbing & word splitting are ever going to be issues, but still wanted to include it for good measure. (Though, should that not be desired, I can quickly update the PR again.)
Is this fix on track to be merged?
@frittentheke I'm not sure what happened with the testing. It will be included in the next main batch.