ozone icon indicating copy to clipboard operation
ozone copied to clipboard

HDDS-7231. Integrate the GetKeyInfo API to key read flows

Open duongkame opened this issue 3 years ago • 3 comments

What changes were proposed in this pull request?

This is a part of the container location cache implementation(HDDS-7223), specified in Client Interaction design.

This PR includes the changes to integrate the GetKeyInfo API to key read flows, keeping client code backward compatible with previous OM versions.

https://issues.apache.org/jira/browse/HDDS-7231

Details about client interaction changes:

Before:

  1. Client calls lookupKey.
  2. Client calls getBlock and readChunk. They when there's an exception in either calls:
    • If it's a security problems, fail without retry.
    • If StorageContainerException (threw when Datanode returns any stattus code than), recall lookupKey to get latest block location info.
    • If it a conectivity issues (all datanodes are not reachable), or any IOException, just retry without recalling lookupKey.

After:

Client vs new OM.

  1. Client calls getKeyInfo with forceCacheUpdate = false to tell OM to take container location from its cache.
  2. Client calls getBlock and readChunk. When there's an exception in either call:
    • If it's a security problems, fail without retry.
    • If StorageContainerException (threw when Datanode returns any stattus code than SUCCESS) or a conectivity issue (all datanodes are not reachable), recall getKeyInfo with forceCacheUpdate = true refresh block location from OM and tell OM to call SCM for updated container locations.
    • If any other IOException, just retry without recalling getKeyInfo.

Client vs old OM.

  1. Client calls lookupKey.
  2. Client calls getBlock and readChunk. When there's an exception in either calls:
    • If it's a security problems, fail without retrying.
    • If StorageContainerException (threw when Datanode returns any stattus code than SUCCESS) or a conectivity issue (all datanodes are not reachable), recall lookupKey to get latest block location info.
    • If any other IOException, just retries without recalling lookupKey.

How was this patch tested?

Standard CI: https://github.com/duongkame/ozone/actions/runs/3191413426/jobs/5207698694

  • Integration tests are added to verify clients correctly

Manual verification for the following cases:

  • New client vs old OM.
  • Old client vs new OM.

duongkame avatar Oct 05 '22 17:10 duongkame

@GeorgeJahad

kerneltime avatar Oct 10 '22 16:10 kerneltime

There is a call to looupKey here: https://github.com/apache/ozone/blob/1ea4c2adc640a02508696bf346c7f1f7b6e0afd9/hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java#L1264

Should that be changed to getKeyInfo()?

GeorgeJahad avatar Oct 13 '22 01:10 GeorgeJahad

There is a call to lookupFile() here: https://github.com/apache/ozone/blob/1ea4c2adc640a02508696bf346c7f1f7b6e0afd9/hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java#L1683 But lookupFile() doesn't appear to use the cache. Should it?

GeorgeJahad avatar Oct 13 '22 01:10 GeorgeJahad

There is a call to looupKey here:

https://github.com/apache/ozone/blob/1ea4c2adc640a02508696bf346c7f1f7b6e0afd9/hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java#L1264

Should that be changed to getKeyInfo()?

Yes, I cleaned that up as well.

There is a call to lookupFile() here:

https://github.com/apache/ozone/blob/1ea4c2adc640a02508696bf346c7f1f7b6e0afd9/hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java#L1683

But lookupFile() doesn't appear to use the cache. Should it?

Thanks, lookupFile will be another path we will tackle for OFS. I think it's to contain the scope of this PR for key reads. The file read path can probably be deprecated and merged to key read, but we'll see.

duongkame avatar Oct 18 '22 22:10 duongkame