hudi Querying Hudi Table Created With Version 0.12.3 Not Working on Trino 430

Tips before filing an issue

Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at [email protected].
If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

I have hudi created with version 0.12.3, when I am trying to Query it using Trino it is able to even start read the tables. But when i create the same table with version 0.12.1. I am able to query it using Trino

To Reproduce

Steps to reproduce the behavior:

Trino EKS Setup File trino.txt trino.txt
Create Hudi Table using EMR with Hudi DELTASTREAMER 0.12.3. JAR of Utility. https://repo1.maven.org/maven2/org/apache/hudi/hudi-utilities-bundle_2.12/0.12.3/hudi-utilities-bundle_2.12-0.12.3.jar https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.3-bundle_2.12/0.12.3/hudi-spark3.3-bundle_2.12-0.12.3.jar
Properties OF HUDI USED BELOW: "hoodie.schema.on.read.enable": "true" "hoodie.cleaner.commits.retained": "3", "hoodie.datasource.write.reconcile.schema": "true", "hoodie.parquet.compression.codec": "zstd", "hoodie.delete.shuffle.parallelism": "200", "hoodie.parquet.max.file.size": "268435456", "hoodie.upsert.shuffle.parallelism": "200", "hoodie.datasource.hive_sync.support_timestamp": "true", "hoodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.CustomKeyGenerator", "hoodie.datasource.write.hive_style_partitioning": "true", "hoodie.insert.shuffle.parallelism": "200", "hoodie.parquet.small.file.limit": "134217728", "hoodie.bootstrap.parallelism": "200", "hoodie.embed.timeline.server": "true", "hoodie.bulkinsert.shuffle.parallelism": "200", "hoodie.datasource.hive_sync.enable": "true", "hoodie.filesystem.view.type": "EMBEDDED_KV_STORE", "hoodie.clean.max.commits": "4" hoodie.metadata.enable: true spark.hadoop.fs.s3.canned.acl: BucketOwnerFullControl hoodie.datasource.hive_sync.support_timestamp=true
I am using KAFKA as Source, here and syncing in table in glue Catalog.
When I run simple query on Trino like "Select * from hudi_table " It is not able to load. 7.SAME properties used for crearting HUDI table with Version 0.12.1. I am able to query it. https://repo1.maven.org/maven2/org/apache/hudi/hudi-utilities-bundle_2.12/0.12.1/hudi-utilities-bundle_2.12-0.12.1.jar https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.3-bundle_2.12/0.12.1/hudi-spark3.3-bundle_2.12-0.12.1.jar

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Hudi version : 0.12.3
Spark version : 3.3
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) :S3
Running on Docker? (yes/no) : no
TRINO VERSION: 430

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

Dec 02 '23 07:12 Amar1404

@Amar1404 Do you get any error when you query. @codope Do you have any insights on this?

Dec 11 '23 12:12 ad1happy2go

Hi @ad1happy2go - I have found the issue is in Syncing of Table in Catalog, Since I am using GLue Catalog. But when I tried creating a table using the HudiSyncTool class the table is not working in trino. But when I used the AwsGlueCatalogSync it is working fine. Not sure what is the difference in between these two classes.

Dec 18 '23 04:12 Amar1404

@Amar1404 Ideally HiveSync also should delegate to AwsGlueCatalogSync if Glue is enabled for EMR. So ideally should not cause any difference.

Jan 31 '24 15:01 ad1happy2go