hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[SUPPORT] Incremental query not working on COW table

Open NishantBaheti opened this issue 1 year ago • 10 comments

Error

Error Category: QUERY_ERROR; AnalysisException: Found duplicate column(s) in the data schema: _hoodie_commit_seqno, _hoodie_commit_time, _hoodie_file_name, _hoodie_partition_path, _hoodie_record_key

Code

hudi_options={ 'hoodie.datasource.query.type': 'incremental', 'hoodie.datasource.read.begin.instanttime': start_time, 'hoodie.datasource.read.end.instanttime': end_time, } df=spark.read
.format("org.apache.hudi")
.options(**hudi_options)
.load(tablePath)

NishantBaheti avatar Mar 12 '24 07:03 NishantBaheti

Hi, @NishantBaheti , thanks for your feedback, could you also supplement the release version for Spark and Hudi respectively.

danny0405 avatar Mar 12 '24 09:03 danny0405

Hello, I am using this jar

  • hudi-spark3.3-bundle_2.12-0.14.1.jar
  • spark 3.3
  • hudi 0.14.1

NishantBaheti avatar Mar 12 '24 09:03 NishantBaheti

@NishantBaheti I checked before, incremental query works fine with 0.14.1. can you paste the full reproducible script or table/writer properties you used to populate. I checked the below code to quickly to reproduce - https://gist.github.com/ad1happy2go/e7a2f8c695fde4c3db060a7113610931

ad1happy2go avatar Mar 12 '24 11:03 ad1happy2go

@NishantBaheti I checked before, incremental query works fine with 0.14.1.

can you paste the full reproducible script or table/writer properties you used to populate. Which writer you used to populate this table?

I checked the below code to quickly to reproduce - https://gist.github.com/ad1happy2go/e7a2f8c695fde4c3db060a7113610931

ad1happy2go avatar Mar 12 '24 11:03 ad1happy2go

image

doesn't work. another issue.

NishantBaheti avatar Mar 12 '24 12:03 NishantBaheti

@NishantBaheti Were you able to get it resolve? Can you let us know full stack trace. Looks like Unable to load class means some library conflicts.

ad1happy2go avatar Apr 01 '24 13:04 ad1happy2go

@ad1happy2go moved to the MOR table. COW configurations felt a little unstable. had to rush the project to production quickly.

NishantBaheti avatar Apr 01 '24 16:04 NishantBaheti

@NishantBaheti Thanks for the update. Surprisingly MOR worked but COW didn't work you.

ad1happy2go avatar Apr 11 '24 16:04 ad1happy2go

@ad1happy2go COW tables were failing a lot, like at the time of reading parquet file not found, no incremental query or getting the error mentioned above. Not saying that MOR is perfect but still had to put something in production with static configurations of MOR with quick compact cleaner so that athena ro tables behave like delta tables from delta framework and should be able to do point query using record index. I hope they figure out a stable version of hudi soon like how delta did.

NishantBaheti avatar Apr 11 '24 16:04 NishantBaheti

@NishantBaheti For incremental queries we can face FileNotFound Exception if the file for that query got deleted by the cleaner. We can set hoodie.datasource.read.incr.fallback.fulltablescan.enable to true to get around this issue.

ad1happy2go avatar Apr 11 '24 16:04 ad1happy2go