paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[Feature] Improve the tryCommitOnce behavior in FileStoreCommitImpl

Open xiangyuf opened this issue 1 year ago • 2 comments

Search before asking

  • [X] I searched in the issues and found nothing similar.

Motivation

When using dedicated compactions in production, we've found the write only job and compact job will failover every 2 or 3 days even if the remote filesystem support atomic rename operation.

The main cause is the FileAlreadyExistsException: image

Checking with recent Hadoop API Rename implementation, we found the rename api will return FileAlreadyExistsException for rename api instead of false by default. https://github.com/apache/hadoop/blob/branch-3.3.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirRenameOp.java image

IMHO, this can be improved by catch certain exceptions in tryCommitOnce and return false to upper caller.

Solution

No response

Anything else?

No response

Are you willing to submit a PR?

  • [X] I'm willing to submit a PR!

xiangyuf avatar May 20 '24 07:05 xiangyuf

Hi @JingsongLi, WDYT about this?

xiangyuf avatar May 22 '24 07:05 xiangyuf

Could you explain what kinds of exceptions you would like to catch, and where you would like to catch them?

tsreaper avatar Jul 20 '24 06:07 tsreaper