HDDS-7231. Integrate the GetKeyInfo API to key read flows
What changes were proposed in this pull request?
This is a part of the container location cache implementation(HDDS-7223), specified in Client Interaction design.
This PR includes the changes to integrate the GetKeyInfo API to key read flows, keeping client code backward compatible with previous OM versions.
https://issues.apache.org/jira/browse/HDDS-7231
Details about client interaction changes:
Before:
- Client calls lookupKey.
- Client calls getBlock and readChunk. They when there's an exception in either calls:
- If it's a security problems, fail without retry.
- If StorageContainerException (threw when Datanode returns any stattus code than), recall lookupKey to get latest block location info.
- If it a conectivity issues (all datanodes are not reachable), or any IOException, just retry without recalling lookupKey.
After:
Client vs new OM.
- Client calls getKeyInfo with
forceCacheUpdate = falseto tell OM to take container location from its cache. - Client calls getBlock and readChunk. When there's an exception in either call:
- If it's a security problems, fail without retry.
- If StorageContainerException (threw when Datanode returns any stattus code than SUCCESS) or a conectivity issue (all datanodes are not reachable), recall getKeyInfo with
forceCacheUpdate = truerefresh block location from OM and tell OM to call SCM for updated container locations. - If any other IOException, just retry without recalling getKeyInfo.
Client vs old OM.
- Client calls lookupKey.
- Client calls getBlock and readChunk. When there's an exception in either calls:
- If it's a security problems, fail without retrying.
- If StorageContainerException (threw when Datanode returns any stattus code than SUCCESS) or a conectivity issue (all datanodes are not reachable), recall lookupKey to get latest block location info.
- If any other IOException, just retries without recalling lookupKey.
How was this patch tested?
Standard CI: https://github.com/duongkame/ozone/actions/runs/3191413426/jobs/5207698694
- Integration tests are added to verify clients correctly
Manual verification for the following cases:
- New client vs old OM.
- Old client vs new OM.
@GeorgeJahad
There is a call to looupKey here: https://github.com/apache/ozone/blob/1ea4c2adc640a02508696bf346c7f1f7b6e0afd9/hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java#L1264
Should that be changed to getKeyInfo()?
There is a call to lookupFile() here: https://github.com/apache/ozone/blob/1ea4c2adc640a02508696bf346c7f1f7b6e0afd9/hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java#L1683 But lookupFile() doesn't appear to use the cache. Should it?
There is a call to looupKey here:
https://github.com/apache/ozone/blob/1ea4c2adc640a02508696bf346c7f1f7b6e0afd9/hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java#L1264
Should that be changed to getKeyInfo()?
Yes, I cleaned that up as well.
There is a call to lookupFile() here:
https://github.com/apache/ozone/blob/1ea4c2adc640a02508696bf346c7f1f7b6e0afd9/hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java#L1683
But lookupFile() doesn't appear to use the cache. Should it?
Thanks, lookupFile will be another path we will tackle for OFS. I think it's to contain the scope of this PR for key reads.
The file read path can probably be deprecated and merged to key read, but we'll see.