ceph icon indicating copy to clipboard operation
ceph copied to clipboard

mds: un-inline data on scrub

Open mchangir opened this issue 4 years ago • 87 comments

If inline data version is not CEPH_INLINE_NONE then move data to data pool object and set inline data version to CEPH_INLINE_NONE.

Fixes: https://tracker.ceph.com/issues/52916 Signed-off-by: Milind Changire [email protected]

Checklist

  • Tracker (select at least one)
    • [x] References tracker ticket
    • [ ] Very recent bug; references commit where it was introduced
    • [ ] New feature (ticket optional)
    • [ ] Doc update (no ticket needed)
    • [ ] Code cleanup (no ticket needed)
  • Component impact
    • [ ] Affects Dashboard, opened tracker ticket
    • [ ] Affects Orchestrator, opened tracker ticket
    • [x] No impact that needs to be tracked
  • Documentation (select at least one)
    • [ ] Updates relevant documentation
    • [x] No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

mchangir avatar Dec 20 '21 03:12 mchangir

There have at least 2 bugs for now in the inline code in kernel [1] and [2], which for [1] it's hard to fix, once this feature is support, IMO we'd better to make the files have inline data to be readonly, and at the same time add one option, which the default is false, to enable readwrite if needed by users.

[1] https://patchwork.kernel.org/project/ceph-devel/patch/[email protected]/ [2] https://tracker.ceph.com/issues/52921

lxbsz avatar Jun 01 '22 00:06 lxbsz

These set of teuthology jobs ran the data uninlining and scrubbing over a set of ~8000 files and uninlined 5145 files in total.

mchangir avatar Jun 10 '22 04:06 mchangir

jenkins retest this please

mchangir avatar Jun 14 '22 12:06 mchangir

There have at least 2 bugs for now in the inline code in kernel [1] and [2], which for [1] it's hard to fix, once this feature is support, IMO we'd better to make the files have inline data to be readonly, and at the same time add one option, which the default is false, to enable readwrite if needed by users.

How is supporting this feature (un-inline during scrub) effecting the bugs you mention? The bugs exist irrespective of this feature availability? Maybe I'm, misunderstanding something? @lxbsz

[1] https://patchwork.kernel.org/project/ceph-devel/patch/[email protected]/ [2] https://tracker.ceph.com/issues/52921

vshankar avatar Jun 15 '22 10:06 vshankar

There have at least 2 bugs for now in the inline code in kernel [1] and [2], which for [1] it's hard to fix, once this feature is support, IMO we'd better to make the files have inline data to be readonly, and at the same time add one option, which the default is false, to enable readwrite if needed by users.

How is supporting this feature (un-inline during scrub) effecting the bugs you mention? The bugs exist irrespective of this feature availability? Maybe I'm, misunderstanding something? @lxbsz

[1] https://patchwork.kernel.org/project/ceph-devel/patch/[email protected]/ [2] https://tracker.ceph.com/issues/52921

There have 2 known bugs in kclient and one of them is hard to be fixed and another one we won't fix it since we are planing to disable the inline data feature in late future.

As we discussed in kernel mail list, since the inline_data feature is half implemented and we were planing to disable it and finally will drop the corresponding code in kclient, and was waiting this feature.

To support that we need to make the files to be readonly from MDS side if they have inlined data after this PR gets merged and then request to uninline them by using the scrub tool first and then continue. I have discussed this with @mchangir and I can add one following patch to do it after this.

lxbsz avatar Jun 15 '22 12:06 lxbsz

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

github-actions[bot] avatar Jun 20 '22 05:06 github-actions[bot]

There is a bug in client/Client.cc when reading, it will uninline the data always, which shouldn't happen, because for read it won't be sure that the Fw caps will be granted.

Fixed it in https://github.com/ceph/ceph/pull/47090, not sure whether will this related to your test failures. @mchangir

lxbsz avatar Jul 14 '22 01:07 lxbsz

rebased to latest

mchangir avatar Jul 18 '22 12:07 mchangir

@lxbsz

for write: so, now the write will take place conditionally only if inline_data length is a non-zero positive value for create: if I completely remove the create op, then write ops fail with ENOENT; at least for newly created files

will refresh the PR once I've addressed comments

mchangir avatar Jul 22 '22 05:07 mchangir

@lxbsz

for write: so, now the write will take place conditionally only if inline_data length is a non-zero positive value for create: if I completely remove the create op, then write ops fail with ENOENT; at least for newly created files

will refresh the PR once I've addressed comments

ack.

lxbsz avatar Jul 22 '22 07:07 lxbsz

rebased code

mchangir avatar Aug 11 '22 05:08 mchangir

Jenkins test this please

gregsfortytwo avatar Aug 16 '22 15:08 gregsfortytwo

rebased code

mchangir avatar Aug 22 '22 06:08 mchangir

@mchangir FYI - http://pulpito.front.sepia.ceph.com/vshankar-2022-08-25_07:08:30-fs-wip-vshankar-uns-20220825-092856-testing-default-smithi/

vshankar avatar Aug 25 '22 07:08 vshankar

@mchangir FYI - http://pulpito.front.sepia.ceph.com/vshankar-2022-08-25_07:08:30-fs-wip-vshankar-uns-20220825-092856-testing-default-smithi/

Most of the failures are due to the ongoing distro mess. I'll schedule a run with https://github.com/ceph/ceph/pull/47805

vshankar avatar Aug 29 '22 04:08 vshankar

@mchangir https://pulpito.ceph.com/vshankar-2022-09-01_13:47:56-fs-wip-vshankar-uns-20220901-150020-testing-default-smithi/

chugging slowly....

vshankar avatar Sep 02 '22 02:09 vshankar

@vshankar new set of teuthology jobs

mchangir avatar Sep 14 '22 12:09 mchangir

@vshankar fyi

Could you rerun the failed tests. I can see one new failure, but that might be a race test case.

vshankar avatar Sep 20 '22 09:09 vshankar

@vshankar fyi

Could you rerun the failed tests. I can see one new failure, but that might be a race test case.

http://pulpito.front.sepia.ceph.com:80/mchangir-2022-09-20_09:35:11-fs-wip-mchangir-mds-uninline-file-during-scrub-testing-default-smithi/

One related failure that's reproducible -- http://pulpito.front.sepia.ceph.com/mchangir-2022-09-20_09:35:11-fs-wip-mchangir-mds-uninline-file-during-scrub-testing-default-smithi/7038745

Rest are known failures. Let's get that fixed and this is good to merge! Nice work @mchangir

vshankar avatar Sep 20 '22 14:09 vshankar

Teuthology Jobs after small correction to test_scrub_pause_and_resume_with_abort code

mchangir avatar Sep 29 '22 03:09 mchangir

Teuthology Jobs after small correction to test_scrub_pause_and_resume_with_abort code

Looks good. There are a couple of running jobs. Let's merge this once that finishes :)

vshankar avatar Sep 29 '22 04:09 vshankar

jenkins retest this please

vshankar avatar Sep 29 '22 04:09 vshankar

jenkins test make check arm64

mchangir avatar Sep 29 '22 16:09 mchangir

jenkins test make check arm64

mchangir avatar Oct 02 '22 06:10 mchangir

jenkins test make check arm64

mchangir avatar Oct 02 '22 11:10 mchangir

@batrick Do you have any more comments on this change? I see there is one pending change request from you.

vshankar avatar Oct 04 '22 06:10 vshankar

Last update contains:

  • individual uninline failures are pushed to DamageTable
  • individual scrub stats now available via scrub status command
  • scrub stats can now be purged manually and automatically via timer event (defaulting to 1 day minimum)

mchangir avatar Nov 03 '22 12:11 mchangir