mds: un-inline data on scrub
If inline data version is not CEPH_INLINE_NONE then move data to data pool object and set inline data version to CEPH_INLINE_NONE.
Fixes: https://tracker.ceph.com/issues/52916 Signed-off-by: Milind Changire [email protected]
Checklist
- Tracker (select at least one)
- [x] References tracker ticket
- [ ] Very recent bug; references commit where it was introduced
- [ ] New feature (ticket optional)
- [ ] Doc update (no ticket needed)
- [ ] Code cleanup (no ticket needed)
- Component impact
- [ ] Affects Dashboard, opened tracker ticket
- [ ] Affects Orchestrator, opened tracker ticket
- [x] No impact that needs to be tracked
- Documentation (select at least one)
- [ ] Updates relevant documentation
- [x] No doc update is appropriate
- Tests (select at least one)
- [ ] Includes unit test(s)
- [x] Includes integration test(s)
- [ ] Includes bug reproducer
- [ ] No tests
Show available Jenkins commands
-
jenkins retest this please -
jenkins test classic perf -
jenkins test crimson perf -
jenkins test signed -
jenkins test make check -
jenkins test make check arm64 -
jenkins test submodules -
jenkins test dashboard -
jenkins test dashboard cephadm -
jenkins test api -
jenkins test docs -
jenkins render docs -
jenkins test ceph-volume all -
jenkins test ceph-volume tox
There have at least 2 bugs for now in the inline code in kernel [1] and [2], which for [1] it's hard to fix, once this feature is support, IMO we'd better to make the files have inline data to be readonly, and at the same time add one option, which the default is false, to enable readwrite if needed by users.
[1] https://patchwork.kernel.org/project/ceph-devel/patch/[email protected]/ [2] https://tracker.ceph.com/issues/52921
These set of teuthology jobs ran the data uninlining and scrubbing over a set of ~8000 files and uninlined 5145 files in total.
jenkins retest this please
There have at least 2 bugs for now in the inline code in kernel [1] and [2], which for [1] it's hard to fix, once this feature is support, IMO we'd better to make the files have inline data to be readonly, and at the same time add one option, which the default is false, to enable readwrite if needed by users.
How is supporting this feature (un-inline during scrub) effecting the bugs you mention? The bugs exist irrespective of this feature availability? Maybe I'm, misunderstanding something? @lxbsz
[1] https://patchwork.kernel.org/project/ceph-devel/patch/[email protected]/ [2] https://tracker.ceph.com/issues/52921
There have at least 2 bugs for now in the inline code in kernel [1] and [2], which for [1] it's hard to fix, once this feature is support, IMO we'd better to make the files have inline data to be readonly, and at the same time add one option, which the default is false, to enable readwrite if needed by users.
How is supporting this feature (un-inline during scrub) effecting the bugs you mention? The bugs exist irrespective of this feature availability? Maybe I'm, misunderstanding something? @lxbsz
[1] https://patchwork.kernel.org/project/ceph-devel/patch/[email protected]/ [2] https://tracker.ceph.com/issues/52921
There have 2 known bugs in kclient and one of them is hard to be fixed and another one we won't fix it since we are planing to disable the inline data feature in late future.
As we discussed in kernel mail list, since the inline_data feature is half implemented and we were planing to disable it and finally will drop the corresponding code in kclient, and was waiting this feature.
To support that we need to make the files to be readonly from MDS side if they have inlined data after this PR gets merged and then request to uninline them by using the scrub tool first and then continue. I have discussed this with @mchangir and I can add one following patch to do it after this.
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved
There is a bug in client/Client.cc when reading, it will uninline the data always, which shouldn't happen, because for read it won't be sure that the Fw caps will be granted.
Fixed it in https://github.com/ceph/ceph/pull/47090, not sure whether will this related to your test failures. @mchangir
rebased to latest
@lxbsz
for write: so, now the write will take place conditionally only if inline_data length is a non-zero positive value
for create: if I completely remove the create op, then write ops fail with ENOENT; at least for newly created files
will refresh the PR once I've addressed comments
@lxbsz
for
write: so, now the write will take place conditionally only if inline_data length is a non-zero positive value forcreate: if I completely remove thecreateop, thenwriteops fail with ENOENT; at least for newly created fileswill refresh the PR once I've addressed comments
ack.
rebased code
Jenkins test this please
rebased code
@mchangir FYI - http://pulpito.front.sepia.ceph.com/vshankar-2022-08-25_07:08:30-fs-wip-vshankar-uns-20220825-092856-testing-default-smithi/
@mchangir FYI - http://pulpito.front.sepia.ceph.com/vshankar-2022-08-25_07:08:30-fs-wip-vshankar-uns-20220825-092856-testing-default-smithi/
Most of the failures are due to the ongoing distro mess. I'll schedule a run with https://github.com/ceph/ceph/pull/47805
@mchangir https://pulpito.ceph.com/vshankar-2022-09-01_13:47:56-fs-wip-vshankar-uns-20220901-150020-testing-default-smithi/
chugging slowly....
@vshankar new set of teuthology jobs
@vshankar fyi
@vshankar fyi
Could you rerun the failed tests. I can see one new failure, but that might be a race test case.
@vshankar fyi
Could you rerun the failed tests. I can see one new failure, but that might be a race test case.
http://pulpito.front.sepia.ceph.com:80/mchangir-2022-09-20_09:35:11-fs-wip-mchangir-mds-uninline-file-during-scrub-testing-default-smithi/
One related failure that's reproducible -- http://pulpito.front.sepia.ceph.com/mchangir-2022-09-20_09:35:11-fs-wip-mchangir-mds-uninline-file-during-scrub-testing-default-smithi/7038745
Rest are known failures. Let's get that fixed and this is good to merge! Nice work @mchangir
Teuthology Jobs after small correction to test_scrub_pause_and_resume_with_abort code
Teuthology Jobs after small correction to
test_scrub_pause_and_resume_with_abortcode
Looks good. There are a couple of running jobs. Let's merge this once that finishes :)
jenkins retest this please
jenkins test make check arm64
jenkins test make check arm64
jenkins test make check arm64
@batrick Do you have any more comments on this change? I see there is one pending change request from you.
Last update contains:
- individual uninline failures are pushed to DamageTable
- individual scrub stats now available via scrub status command
- scrub stats can now be purged manually and automatically via timer event (defaulting to 1 day minimum)