HDDS-6682. Validate Bucket ID of bucket associated with in-flight requests.
What changes were proposed in this pull request?
In high concurrency scenarios (which will become more common once we introduced prefix-based locking), there is a possibility of the following race condition:
Take for instance the following scenario and 3 concurrent write requests:
Bucket vol/buck1 exists with LEGACY layout.
Request 1: CreateKey by an older client (pre- bucket layout) on a bucket vol/buck1.
Request 2: DeleteBucket by a new client on the bucket vol/buck1.
Request 3: CreateBucket by a new client on the bucket vol/buck1 with layout FILE_SYSTEM_OPTIMIZED.
Let's say that these requests are processed in the following order:
-
Request 1is picked up by one of the threads, which proceeds to run thePRE_PROCESSvalidations on this request. The validator we are interested in is calledblockCreateKeyWithBucketLayoutFromOldClient. This validator will make sure that the bucket associated with this request is aLEGACYbucket - which is the pre-defined behavior in the case of old client/new cluster interactions since we do not want an old client operating on buckets using a new metadata layout.
One thing to know here is that at this stage, the OM does not hold a bucket lock (which only happens inside the updateAndValidateCache method associated with the write request's handler class).
-
While
Request 1was being processed, another thread was processingRequest 2. Let's say `Request2' managed to get hold of the bucket lock and successfully completed the bucket deletion. -
Now before
Request 1got a chance to acquire the bucket lock,Request 3manages to acquire it. It proceeds with the bucket creation and creates a new bucketvol/buck1withFILE_SYSTEM_OPTIMIZEDbucket layout. -
Finally,
Request 1is able to acquire the bucket lock and proceeds to enter itsvalidateAndUpdateCachemethod. However, even though it is able to find the bucket it is looking for, this is not the same bucket that was validated in its pre-processing hook. This new bucket has the same name, but a different bucket layout. The request ends up modifying a bucket that it should not be allowed to touch.
This race condition can lead to undefined behavior of the Ozone cluster, where older clients might be modifying information they do not understand.
This PR aims to add bucket ID validation to the request processing flow, which would make sure that the bucket that ends up being processed is the same one that was validated.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-6682
How was this patch tested?
Existing test cases.
@rakeshadr Could you please review the changes I've made to this patch.
The number of files we are touching is high since there were a lot of unit test fixes.
There is also a change in the method signature for OMKeyRequest.validateBucketAndVolume since we now need to send an OzoneManager instance instead of OMMetadataManager instance because we need access to OzoneManager.resolveBucketLink method.
Most of these code changes are simple, however.
@rakeshadr could you please review this patch?
Hello, @errose28 @rakeshadr. I have rebased the PR on top of the latest master. Could you please review it?
As @errose28 already pointed out as per HDDS-6931, the common theme of the problems is: pre-process validation should only perform logic on data which is not supposed to change between pre-process and actual execution in validateAndUpdateCache. Otherwise, that validation should be moved to validateAndUpdateCache (and is done together with the actual data change in a proper lock scope) to ensure consistency.
This change introduces a new mechanism to solve a rare case when a bucket is re-created between pre-processing (a bucket needs to be empty to be deleted, so this case may not happen in a high concurrency). Why don't we just validate the bucket layout in validateAndUpdateCache?