hudi icon indicating copy to clipboard operation
hudi copied to clipboard

there is no data when a couple of hudi tables join

Open njalan opened this issue 2 years ago • 3 comments

There is one etl job run every hour and it is insert overwrite one table from the results that is generated by some hudi table join. It happens like one a week that there is no data inserted.

Environment Description

Hudi version : 0.9.1

Spark version : 3.0.1

Hive version : 3

Hadoop version : 3.2.2

Storage (HDFS/S3/GCS..) : s3

Running on Docker? (yes/no) : no

what cab be the reason ? Is there any way to debug this kind of issues or how to get the more metrics for it?

njalan avatar Dec 19 '23 15:12 njalan

Just to clarify, you are having one etl table which loads a full refresh table using other multiple hudi tables involving joins. Once a week you are seeing that that table is loaded with no data.

To debug this, when this happens you can try to see if that join is resulting any data. You can also use point in time queries to exactly get data for the tables at that time.

ad1happy2go avatar Dec 19 '23 16:12 ad1happy2go

Whether it is related to below warning:

23/12/19 09:23:28 INFO Executor: Running task 4.0 in stage 0.0 (TID 4) 23/12/19 09:23:28 WARN InMemoryFileIndex: The directory xxx/testing/2681717c-28d2-4f56-9664-4037cbe67c9b-0_2-221-9002_20231219071501.parquet was not found. Was it deleted very recently? 23/12/19 09:23:28 INFO Executor: Finished task 4.0 in stage 0.0 (TID 4). 930 bytes result sent to driver 23/12/19 09:23:28 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 5 23/12/19 09:23:28 INFO Executor: Running task 5.0 in stage 0.0 (TID 5) 23/12/19 09:23:28 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 906 bytes result sent to driver 23/12/19 09:23:28 WARN InMemoryFileIndex: The directory xxx/testing/2681717c-28d2-4f56-9664-4037cbe67c9b-1_2-221-9002_20231219071501.parquet was not found. Was it deleted very recently? 23/12/19 09:23:28 INFO Executor: Finished task 5.0 in stage 0.0 (TID 5). 930 bytes result sent to driver 23/12/19 09:23:28 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 6 23/12/19 09:23:28 INFO Executor: Running task 6.0 in stage 0.0 (TID 6) 23/12/19 09:23:28 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 7 23/12/19 09:23:28 INFO Executor: Running task 7.0 in stage 0.0 (TID 7) 23/12/19 09:23:28 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from xxx/testing 23/12/19 09:23:28 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from xxx/testing 23/12/19 09:23:28 INFO HoodieTableConfig: Loading table properties from xxx/testing/.hoodie/hoodie.properties

njalan avatar Dec 20 '23 01:12 njalan

@njalan Don't think if its related and can cause this. We may be getting this as there may be another process which is updating the source table simultaneously? Is that correct?

ad1happy2go avatar Dec 22 '23 12:12 ad1happy2go

@njalan Were you able to resolve this or understand the root cause. Please do let us know. thanks.

ad1happy2go avatar Jan 10 '24 09:01 ad1happy2go