hudi icon indicating copy to clipboard operation
hudi copied to clipboard

Filegroup name seems incorrect for log file created with NBCC

Open hudi-bot opened this issue 2 months ago • 2 comments

Test in here "testMultiBaseFile": [^TestSparkNonBlockingConcurrencyControl.java]

  bulkInsertFirst=true works fine, but the test will fail for bulkInsertFirst=false   This is because the name of the filegroup created by the bulk insert at the end seems to be wrong.

I have attached a copy of my terminal looking at the tables for both tests, but I have extracted the relevant info here so it is easier to read. Take a look at those files if think something looks wrong with the info below     Here is the timeline for bulkInsertFirst=true:

In this case, we do a bulk insert, 2 overlapping upserts, then a bulk insert {code:java} 20241008155534129.deltacommit.inflight 20241008155534129.deltacommit.requested 20241008155534129_20241008155538371.deltacommit 20241008155538785.deltacommit.inflight 20241008155538785.deltacommit.requested 20241008155538785_20241008155539942.deltacommit 20241008155539336.deltacommit.inflight 20241008155539336.deltacommit.requested
20241008155539336_20241008155540151.deltacommit 20241008155540193.deltacommit.inflight 20241008155540193.deltacommit.requested 20241008155540193_20241008155540768.deltacommit {code} And here are the files in the table: {code:java} .00000000-0000-0000-0000-000000000000-0_20241008155538785.log.1_0-24-34 .00000000-0000-0000-0000-000000000000-0_20241008155539336.log.1_0-30-45 .00000000-0000-0000-0000-000000000000-0_20241008155540193.log.1_0-50-74 00000000-0000-0000-0000-000000000000-0_0-12-14_20241008155534129.parquet {code}   Here is the timeline for bulkInsertFirst=false:

in this case we do 2 overlapping upserts, then a bulk insert {code:java} 20241008155116873.deltacommit.inflight 20241008155116873.deltacommit.requested 20241008155116873_20241008155118089.deltacommit 20241008155117398.deltacommit.inflight 20241008155117398.deltacommit.requested 20241008155117398_20241008155118282.deltacommit 20241008155118321.deltacommit.inflight 20241008155118321.deltacommit.requested 20241008155118321_20241008155118833.deltacommit {code} And here are the files in the table: {code:java} .00000000-0000-0000-0000-000000000000_20241008155116873.log.1_0-71-102 .00000000-0000-0000-0000-000000000000_20241008155117398.log.1_0-77-113   .00000000-0000-0000-0000-0_20241008155118321.log.1_0-97-142{code} As you can see, the third log file here looks different than all the rest

JIRA info

  • Link: https://issues.apache.org/jira/browse/HUDI-8328
  • Type: Bug
  • Fix version(s):
    • 1.1.0
  • Attachment(s):
    • 09/Oct/24 15:13;jonvex;TestSparkNonBlockingConcurrencyControl.java;https://issues.apache.org/jira/secure/attachment/13072088/TestSparkNonBlockingConcurrencyControl.java
    • 09/Oct/24 15:23;jonvex;bulkInsertFirst=false.txt;https://issues.apache.org/jira/secure/attachment/13072086/bulkInsertFirst%3Dfalse.txt
    • 09/Oct/24 15:23;jonvex;bulkInsertFirst=true.txt;https://issues.apache.org/jira/secure/attachment/13072087/bulkInsertFirst%3Dtrue.txt

hudi-bot avatar Nov 30 '25 10:11 hudi-bot

Linked PR(s)

  • https://github.com/apache/hudi/pull/12100
  • https://github.com/apache/hudi/pull/12101

hudi-bot avatar Dec 09 '25 04:12 hudi-bot

@danny0405 is this a non-issue based on the closed PRs?

yihua avatar Dec 12 '25 01:12 yihua