hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[SUPPORT] docker demo not working: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/parquet/format/TypeDefinedOrder

Open Souldiv opened this issue 10 months ago • 8 comments

Tips before filing an issue

  • Have you gone through our FAQs? yes

  • Join the mailing list to engage in conversations and get faster support at [email protected].

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

The docker demo doesn't work, I have tried it for different releases but all of them give me the same error within the adhoc-2 container.

To Reproduce

Steps to reproduce the behavior:

  • cloned repository, checked out on branch release-0.15.0
  • built hudi using maven
  • ran all the steps and encountered error at run_sync_tool in the adhoc container

Expected behavior

A clear and concise description of what you expected to happen.

The demo works

Environment Description

  • Hudi version : 0.15

  • Spark version : 3.5.3

  • Hive version : 2.3.3

  • Hadoop version : 2.8.4

  • Storage (HDFS/S3/GCS..) : HDFS

  • Running on Docker? (yes/no) : yes

Additional context

Add any other context about the problem here.

Stacktrace

2025-03-10 05:15:46,708 INFO  [main] ddl.JDBCExecutor (JDBCExecutor.java:createHiveConnection(105)) - Successfully established Hive connection to  jdbc:hive2://hiveserver:10000
2025-03-10 05:15:46,709 INFO  [main] hive.HiveSyncTool (HiveSyncTool.java:syncHoodieTable(171)) - Syncing target hoodie table with hive table(default.stock_ticks_cow). Hive metastore URL from HiveConf:thrift://hivemetastore:9083). Hive metastore URL from HiveSyncConfig:null, basePath :/user/hive/warehouse/stock_ticks_cow
2025-03-10 05:15:46,709 INFO  [main] hive.HiveSyncTool (HiveSyncTool.java:syncHoodieTable(233)) - Trying to sync hoodie table stock_ticks_cow with base path /user/hive/warehouse/stock_ticks_cow of type COPY_ON_WRITE
2025-03-10 05:15:46,928 INFO  [main] hive.metastore (HiveMetaStoreClient.java:close(564)) - Closed a connection to metastore, current connections: 0
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/parquet/format/TypeDefinedOrder
	at org.apache.parquet.format.converter.ParquetMetadataConverter.<clinit>(ParquetMetadataConverter.java:85)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:433)
	at org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:110)
	at org.apache.hudi.common.util.ParquetUtils.readSchema(ParquetUtils.java:242)
	at org.apache.hudi.common.util.ParquetUtils.readAvroSchema(ParquetUtils.java:264)
	at org.apache.hudi.common.table.TableSchemaResolver.fetchSchemaFromFiles(TableSchemaResolver.java:478)
	at org.apache.hudi.common.table.TableSchemaResolver.getTableParquetSchemaFromDataFile(TableSchemaResolver.java:262)
	at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFileInternal(TableSchemaResolver.java:115)
	at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:111)
	at org.apache.hudi.common.table.TableSchemaResolver.hasOperationField(TableSchemaResolver.java:408)
	at org.apache.hudi.util.Lazy.get(Lazy.java:54)
	at org.apache.hudi.common.table.TableSchemaResolver.getTableSchemaFromLatestCommitMetadata(TableSchemaResolver.java:215)
	at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaInternal(TableSchemaResolver.java:183)
	at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:136)
	at org.apache.hudi.common.table.ParquetTableSchemaResolver.getTableParquetSchema(ParquetTableSchemaResolver.java:63)
	at org.apache.hudi.sync.common.HoodieSyncClient.getStorageSchema(HoodieSyncClient.java:110)
	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:241)
	at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:189)
	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:177)
	at org.apache.hudi.hive.HiveSyncTool.main(HiveSyncTool.java:547)
Caused by: java.lang.ClassNotFoundException: org.apache.parquet.format.TypeDefinedOrder
	at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	... 20 more

Souldiv avatar Mar 10 '25 05:03 Souldiv

Hi @Souldiv

Could you please share the reproducible steps and commands you used to replicate the above issue?

rangareddy avatar Mar 12 '25 10:03 rangareddy

hey @rangareddy : did you try docker demo with 0.15.0 branch. can you report it back once you could get it working successfully on your end.

nsivabalan avatar Mar 19 '25 03:03 nsivabalan

hey @rangareddy I have followed the steps outlined here I get that error when I try to run the sync tool for hive. I believe it might be an issue with the env var $HUDI_CLASSPATH not being set. I tried running it on prem as well with individual services and it worked when I set that var.

Souldiv avatar Mar 19 '25 03:03 Souldiv

Hi @rangareddy wanted to ask if there were any updates on this regarding this ticket?

Souldiv avatar Mar 25 '25 22:03 Souldiv

@rangareddy any updates on this? We have another user reporting this.

dipankarmazumdar avatar May 02 '25 15:05 dipankarmazumdar

@rangareddy this is being caused by a missing include (org.apache.parquet:parquet-format) in this file

i checked the files in the generated jar.... with the include missing in the pom .. following are the contents of the the jar

noname@noname-linux:~/hudi/hudi/docker$ jar tvf /home/noname/hudi/hudi/packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-bundle-1.0.1.jar | grep -i "org/apache/parquet/format/" 0 Mon Jan 28 13:39:44 IST 2019 org/apache/parquet/format/ 0 Mon Jan 28 13:39:44 IST 2019 org/apache/parquet/format/converter/ 1861 Mon Jan 28 13:39:44 IST 2019 org/apache/parquet/format/converter/ParquetMetadataConverter$NoFilter.class 4402 Mon Jan 28 13:39:44 IST 2019 org/apache/parquet/format/converter/ParquetMetadataConverter$2.class 1760 Mon Jan 28 13:39:44 IST 2019 org/apache/parquet/format/converter/ParquetMetadataConverter$MetadataFilterVisitor.class 2067 Mon Jan 28 13:39:44 IST 2019 org/apache/parquet/format/converter/ParquetMetadataConverter$OffsetMetadataFilter.class 1268 Mon Jan 28 13:39:44 IST 2019 org/apache/parquet/format/converter/ParquetMetadataConverter$MetadataFilter.class 2222 Mon Jan 28 13:39:44 IST 2019 org/apache/parquet/format/converter/ParquetMetadataConverter$RangeMetadataFilter.class 4925 Mon Jan 28 13:39:44 IST 2019 org/apache/parquet/format/converter/ParquetMetadataConverter$1.class 1907 Mon Jan 28 13:39:44 IST 2019 org/apache/parquet/format/converter/ParquetMetadataConverter$SkipMetadataFilter.class 1419 Mon Jan 28 13:39:44 IST 2019 org/apache/parquet/format/converter/ParquetMetadataConverter$SortOrder.class 41106 Mon Jan 28 13:39:44 IST 2019 org/apache/parquet/format/converter/ParquetMetadataConverter.class 4552 Mon Jan 28 13:39:44 IST 2019 org/apache/parquet/format/converter/ParquetMetadataConverter$3.class

Including TypeDefinedOrder , bunch of other classes were also missing. After the include, all required classes populated as expected and the hive sync flow worked for me.

This dependency is mentioned here

I tried this with Hudi 1.0.1

uptycs-Sushrut avatar May 02 '25 15:05 uptycs-Sushrut

With version 1.0.2 I also have this error

etastore URL from HiveSyncConfig:null, basePath :/user/hive/warehouse/stock_ticks_cow 2025-06-04 15:17:23,480 INFO [main] hive.HiveSyncTool (HiveSyncTool.java:syncHoodieTable(233)) - Trying to sync hoodie table stock_ticks_cow with base path /user/hive/warehouse/stock_ticks_cow of type COPY_ON_WRITE 2025-06-04 15:17:23,922 INFO [main] hive.metastore (HiveMetaStoreClient.java:close(564)) - Closed a connection to metastore, current connections: 0 Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/parquet/format/TypeDefinedOrder at org.apache.parquet.format.converter.ParquetMetadataConverter.(ParquetMetadataConverter.java:85) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:433) at org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:112) at org.apache.hudi.common.util.ParquetUtils.readSchema(ParquetUtils.java:239) at org.apache.hudi.common.util.ParquetUtils.readAvroSchema(ParquetUtils.java:261) at org.apache.hudi.common.table.TableSchemaResolver.fetchSchemaFromFiles(TableSchemaResolver.java:477) at org.apache.hudi.common.table.TableSchemaResolver.getTableParquetSchemaFromDataFile(TableSchemaResolver.java:262) at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFileInternal(TableSchemaResolver.java:115) at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:111) at org.apache.hudi.common.table.TableSchemaResolver.hasOperationField(TableSchemaResolver.java:408) at org.apache.hudi.util.Lazy.get(Lazy.java:54) at org.apache.hudi.common.table.TableSchemaResolver.getTableSchemaFromLatestCommitMetadata(TableSchemaResolver.java:215) at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaInternal(TableSchemaResolver.java:183) at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:136) at org.apache.hudi.common.table.ParquetTableSchemaResolver.getTableParquetSchema(ParquetTableSchemaResolver.java:63) at org.apache.hudi.sync.common.HoodieSyncClient.getStorageSchema(HoodieSyncClient.java:110) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:241) at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:189) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:177) at org.apache.hudi.hive.HiveSyncTool.main(HiveSyncTool.java:547) Caused by: java.lang.ClassNotFoundException: org.apache.parquet.format.TypeDefinedOrder at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 20 more

I added the following dependency in the file indicated by uptycs-Sushrut, but it does not work either

org.apache.parquet parquet-format 2.11.0

Any idea how to make the demo work?

SOULANEIX avatar Jun 04 '25 15:06 SOULANEIX

@SOULANEIX can you try with 2.4.0 version... its listed here

uptycs-Sushrut avatar Jun 05 '25 16:06 uptycs-Sushrut

I download the hudi source from the url : https://github.com/apache/hudi/archive/refs/tags/release-1.0.2.zip

then follow the doc of version1.0.2: https://hudi.apache.org/docs/docker_demo

and I aslo get the error below,

can someone know how to fix this? Its important to a beginner when first time to learn hudi

2025-07-10 13:25:17,298 INFO [main] hive.metastore (HiveMetaStoreClient.java:close(564)) - Closed a connection to metastore, current connections: 0 Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/parquet/format/TypeDefinedOrder at org.apache.parquet.format.converter.ParquetMetadataConverter.(ParquetMetadataConverter.java:85) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:433) at org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:112) at org.apache.hudi.common.util.ParquetUtils.readSchema(ParquetUtils.java:239) at org.apache.hudi.common.util.ParquetUtils.readAvroSchema(ParquetUtils.java:261) at org.apache.hudi.common.table.TableSchemaResolver.fetchSchemaFromFiles(TableSchemaResolver.java:477) at org.apache.hudi.common.table.TableSchemaResolver.getTableParquetSchemaFromDataFile(TableSchemaResolver.java:262) at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFileInternal(TableSchemaResolver.java:115) at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:111) at org.apache.hudi.common.table.TableSchemaResolver.hasOperationField(TableSchemaResolver.java:408) at org.apache.hudi.util.Lazy.get(Lazy.java:54) at org.apache.hudi.common.table.TableSchemaResolver.getTableSchemaFromLatestCommitMetadata(TableSchemaResolver.java:215) at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaInternal(TableSchemaResolver.java:183) at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:136) at org.apache.hudi.common.table.ParquetTableSchemaResolver.getTableParquetSchema(ParquetTableSchemaResolver.java:63) at org.apache.hudi.sync.common.HoodieSyncClient.getStorageSchema(HoodieSyncClient.java:110) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:241) at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:189) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:177) at org.apache.hudi.hive.HiveSyncTool.main(HiveSyncTool.java:547) Caused by: java.lang.ClassNotFoundException: org.apache.parquet.format.TypeDefinedOrder at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 20 more

Image

keashem avatar Jul 10 '25 13:07 keashem

Hi @Souldiv @keashem

Could you please let me know which operating system you're using? I recently resolved some issues for macOS with the latest Hudi version.

https://github.com/apache/hudi/blob/master/docker/compose/docker-compose_hadoop284_hive233_spark353_arm64.yml

rangareddy avatar Aug 26 '25 12:08 rangareddy

i will fix it!

gggyd123 avatar Sep 05 '25 04:09 gggyd123

Fix is available in master branch.

rangareddy avatar Oct 27 '25 13:10 rangareddy

Fix is available in master branch.

pr #13843 is still in an open state and has not been merged.

gggyd123 avatar Oct 28 '25 08:10 gggyd123