hudi-rs icon indicating copy to clipboard operation
hudi-rs copied to clipboard

Load empty table failed.

Open gohalo opened this issue 1 year ago • 8 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Description of the bug

When try to load an empty hudi table, will paniked with the following error.

Failed to resolve the latest schema: no file path found

Steps To Reproduce

the empty table will reproduce.

Expected behavior

just return empty recordes

Screenshots / Logs

No response

Software information

not related

Additional context

No response

gohalo avatar Aug 31 '24 09:08 gohalo

I think, it's mainly because the get_latest_schema() function, which try to load the schema from the latest parquet file, but actually there doesn't have any base file now.

gohalo avatar Aug 31 '24 09:08 gohalo

@gohalo do you want to take this up? i had some similar fix in https://github.com/Eventual-Inc/Daft/pull/2268/files . see if you can follow similar logic and apply it here.

xushiyan avatar Sep 14 '24 20:09 xushiyan

@gohalo we actually have a test case timeline_read_latest_schema_from_empty_table. can you look into this and see what is not covered, and fix accordingly?

xushiyan avatar Sep 15 '24 06:09 xushiyan

@xushiyan i will try to fix that later 😀

gohalo avatar Sep 15 '24 11:09 gohalo

hi @gohalo any update on this? trying to get this included soon in the next release

@gohalo we actually have a test case timeline_read_latest_schema_from_empty_table. can you look into this and see what is not covered, and fix accordingly?

xushiyan avatar Sep 23 '24 17:09 xushiyan

@xushiyan

Actually the result is same with the test case timeline_read_latest_schema_from_empty_table, it's just return some errors described before.

https://github.com/apache/hudi-rs/blob/5e1981f0380ef43f7fab4eb2229820c82c717e29/crates/core/src/table/timeline.rs#L243-L258

Change the following https://github.com/apache/hudi-rs/blob/5e1981f0380ef43f7fab4eb2229820c82c717e29/crates/core/src/table/mod.rs#L144-L145 to

.await?;

will got the detail error message which is same.

Failed to resolve the latest schema: no file path found

I'm trying to load the schema from the hoodies.table.create.scheam field of hoodie.properties, but I found we should to support parse java properties file, and not got a simple crates now.

And if we try to support loading properties file, maybe a litte different with the global config file. Or we could parse them both in properties file format.

Still wander if this is the right solution.

gohalo avatar Sep 24 '24 03:09 gohalo

@gohalo We can't load hoodie.table.create.schema for this api as it's not always available and it could get obsolete when table evolves.

We currently expect the api to return Error when user tries to get schema from an empty table. I was curious which code path you are getting panic, because by right we should always get a Result which can be an Error. Can you clarify how did you get the panic!() ? And fix accordingly?

xushiyan avatar Sep 24 '24 05:09 xushiyan

The panic is because of unhandled Result, with await? or unwrap().

Just think it's ok for empty result instead of some error, which act different like spark or flink. As you said, maybe we could solve with schema evolution feature.

gohalo avatar Sep 24 '24 06:09 gohalo

I was asking about which code path caused panic. Spotted that the datafusion api wasn't handling it now it's fixed.

xushiyan avatar Nov 20 '24 07:11 xushiyan