hudi [SUPPORT] Incremental query not working on COW table

Error

Error Category: QUERY_ERROR; AnalysisException: Found duplicate column(s) in the data schema: _hoodie_commit_seqno, _hoodie_commit_time, _hoodie_file_name, _hoodie_partition_path, _hoodie_record_key

Code

hudi_options={ 'hoodie.datasource.query.type': 'incremental', 'hoodie.datasource.read.begin.instanttime': start_time, 'hoodie.datasource.read.end.instanttime': end_time, } df=spark.read
.format("org.apache.hudi")
.options(**hudi_options)
.load(tablePath)

Mar 12 '24 07:03 NishantBaheti

Hi, @NishantBaheti , thanks for your feedback, could you also supplement the release version for Spark and Hudi respectively.

Mar 12 '24 09:03 danny0405

Hello, I am using this jar

hudi-spark3.3-bundle_2.12-0.14.1.jar
spark 3.3
hudi 0.14.1

Mar 12 '24 09:03 NishantBaheti

@NishantBaheti I checked before, incremental query works fine with 0.14.1. can you paste the full reproducible script or table/writer properties you used to populate. I checked the below code to quickly to reproduce - https://gist.github.com/ad1happy2go/e7a2f8c695fde4c3db060a7113610931

Mar 12 '24 11:03 ad1happy2go

@NishantBaheti I checked before, incremental query works fine with 0.14.1.

can you paste the full reproducible script or table/writer properties you used to populate. Which writer you used to populate this table?

I checked the below code to quickly to reproduce - https://gist.github.com/ad1happy2go/e7a2f8c695fde4c3db060a7113610931

Mar 12 '24 11:03 ad1happy2go

doesn't work. another issue.

Mar 12 '24 12:03 NishantBaheti

@NishantBaheti Were you able to get it resolve? Can you let us know full stack trace. Looks like Unable to load class means some library conflicts.

Apr 01 '24 13:04 ad1happy2go

@ad1happy2go moved to the MOR table. COW configurations felt a little unstable. had to rush the project to production quickly.

Apr 01 '24 16:04 NishantBaheti

@NishantBaheti Thanks for the update. Surprisingly MOR worked but COW didn't work you.

Apr 11 '24 16:04 ad1happy2go

@ad1happy2go COW tables were failing a lot, like at the time of reading parquet file not found, no incremental query or getting the error mentioned above. Not saying that MOR is perfect but still had to put something in production with static configurations of MOR with quick compact cleaner so that athena ro tables behave like delta tables from delta framework and should be able to do point query using record index. I hope they figure out a stable version of hudi soon like how delta did.

Apr 11 '24 16:04 NishantBaheti

@NishantBaheti For incremental queries we can face FileNotFound Exception if the file for that query got deleted by the cleaner. We can set hoodie.datasource.read.incr.fallback.fulltablescan.enable to true to get around this issue.

Apr 11 '24 16:04 ad1happy2go