ozone icon indicating copy to clipboard operation
ozone copied to clipboard

HDDS-7295. Introduce delimiter-awareness into `KeyIteratorFSO`

Open k5342 opened this issue 3 years ago • 4 comments

What changes were proposed in this pull request?

When FSO-enabled bucket, we can optimize KeyIterator by delimiter param. If we have large bucket, reduce much query execution by stopping recursive listStatus query. It can be useful when user just fetches bucket structure via S3 API.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7295

How was this patch tested?

  • DONE: Manual tests on our cluster (aws s3 ls s3://bucket/ is success)
  • TODO: Write more unittests

k5342 avatar Oct 17 '22 05:10 k5342

Thanks @k5342 for working on this. Currently there is already the keyPrefix param in the listKeys api which lists only the keys that begin with prefix. I feel if we are planning to implement the delimiter param it should be implemented uniformly across buckets as the listing behaviour shouldn’t differ between buckets. What do you think? I think it would require server side changes to implement this for OBS buckets although I agree that its more beneficial for FSO as it’s a more expensive call.

sadanand48 avatar Oct 17 '22 15:10 sadanand48

We can also leverage the already existing bucket.listStatus() to achieve this, If the delimiter is defined then we call listStatus in listKeys for the directory however we would be limited to using "/" as the delimiter

sadanand48 avatar Oct 17 '22 16:10 sadanand48

Thank you @sadanand48 for your review. As a principle, we keep S3G behavior to S3 compatible regardless of its bucket type. As a design choice, about the term behavior, I assume we are talking the behavior means about KeyIterator's behavior (not S3G's result). As you said, this patch only supports FSO buckets. I've added a delimiter parameter to the KeyIterator's constructor but it's never used for now. As this initial patch, I thought we don't need to unify behavior because non-recursive call for OBS buckets as it's heavy to introduce and KeyIterator is internal objects. I completely agree with you to unify behavior at KeyIterator interface level. In that case, is that meant to reimplement delimiter-aware filter in OM server-side protocol handlers (or, we can put as another ticket)?

k5342 avatar Oct 26 '22 10:10 k5342

@k5342 do you plan to continue with this PR? Can you please rebase and address the review comments?

kerneltime avatar Jun 13 '24 16:06 kerneltime