[Feature] Improve the tryCommitOnce behavior in FileStoreCommitImpl
Search before asking
- [X] I searched in the issues and found nothing similar.
Motivation
When using dedicated compactions in production, we've found the write only job and compact job will failover every 2 or 3 days even if the remote filesystem support atomic rename operation.
The main cause is the FileAlreadyExistsException:
Checking with recent Hadoop API Rename implementation, we found the rename api will return FileAlreadyExistsException for rename api instead of false by default.
https://github.com/apache/hadoop/blob/branch-3.3.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirRenameOp.java
IMHO, this can be improved by catch certain exceptions in tryCommitOnce and return false to upper caller.
Solution
No response
Anything else?
No response
Are you willing to submit a PR?
- [X] I'm willing to submit a PR!
Hi @JingsongLi, WDYT about this?
Could you explain what kinds of exceptions you would like to catch, and where you would like to catch them?