hudi icon indicating copy to clipboard operation
hudi copied to clipboard

Hudi Read Performance : Partition pruning not happening when reading Hudi table

Open tarunguptanit opened this issue 3 years ago • 1 comments

I have a Hudi table that was created using Hudi 0.5.3. The table is partitioned by year/month/date.

We recently upgraded the Hudi library to use Hudi 0.9.0. We started noticing performance issues while reading. Seems like partition pruning is not happening when reading through Hudi 0.9.0.

The below operation through Hudi 0.5.3 takes ~1 second.

scala> val hudiDirectory = "s3a://podsofaupgradetesting-v2/HZ_CUST_ACCOUNTS/2022/05/04/"
hudiDirectory: String = s3a://podsofaupgradetesting-v2/HZ_CUST_ACCOUNTS/2022/05/04/
scala> val hoodieIncrementalView = spark.read.format("org.apache.hudi").load(hudiDirectory)
22/07/22 00:55:54 INFO DefaultSource: Constructing hoodie (as parquet) data source with options :Map(hoodie.datasource.view.type -> read_optimized, path -> s3a://podsofaupgradetesting-v2/HZ_CUST_ACCOUNTS/2022/05/04/)
22/07/22 00:55:55 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from s3a://podsofaupgradetesting-v2/HZ_CUST_ACCOUNTS
22/07/22 00:55:55 INFO FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://ip-10-0-1-118.ec2.internal:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/etc/spark/conf.dist/hive-site.xml], FileSystem: [S3AFileSystem{uri=s3a://podsofaupgradetesting-v2, workingDir=s3a://podsofaupgradetesting-v2/user/hadoop, inputPolicy=normal, partSize=104857600, enableMultiObjectsDelete=true, maxKeys=5000, readAhead=65536, blockSize=33554432, multiPartThreshold=2147483647, boundedExecutor=BlockingThreadPoolExecutorService{SemaphoredDelegatingExecutor{permitCount=25, available=25, waiting=0}, activeCount=0}, unboundedExecutor=java.util.concurrent.ThreadPoolExecutor@68032a80[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0], statistics {93 bytes read, 0 bytes written, 7 read ops, 0 large read ops, 0 write ops}, metrics {{Context=S3AFileSystem} {FileSystemId=d9111a33-e2a3-4cd5-b278-fbff8ed2341a-podsofaupgradetesting-v2} {fsURI=s3a://podsofaupgradetesting-v2/HZ_CUST_ACCOUNTS/2022/05/04} {files_created=0} {files_copied=0} {files_copied_bytes=0} {files_deleted=0} {fake_directories_deleted=0} {directories_created=0} {directories_deleted=0} {ignored_errors=0} {op_copy_from_local_file=0} {op_exists=3} {op_get_file_status=6} {op_glob_status=0} {op_is_directory=1} {op_is_file=0} {op_list_files=0} {op_list_located_status=0} {op_list_status=1} {op_mkdirs=0} {op_rename=0} {object_copy_requests=0} {object_delete_requests=0} {object_list_requests=5} {object_continue_list_requests=0} {object_metadata_requests=10} {object_multipart_aborted=0} {object_put_bytes=0} {object_put_requests=0} {object_put_requests_completed=0} {stream_write_failures=0} {stream_write_block_uploads=0} {stream_write_block_uploads_committed=0} {stream_write_block_uploads_aborted=0} {stream_write_total_time=0} {stream_write_total_data=0} {object_put_requests_active=0} {object_put_bytes_pending=0} {stream_write_block_uploads_active=0} {stream_write_block_uploads_pending=0} {stream_write_block_uploads_data_pending=0} {stream_read_fully_operations=0} {stream_opened=1} {stream_bytes_skipped_on_seek=0} {stream_closed=1} {stream_bytes_backwards_on_seek=0} {stream_bytes_read=93} {stream_read_operations_incomplete=1} {stream_bytes_discarded_in_abort=0} {stream_close_operations=1} {stream_read_operations=1} {stream_aborted=0} {stream_forward_seek_operations=0} {stream_backward_seek_operations=0} {stream_seek_operations=0} {stream_bytes_read_in_close=0} {stream_read_exceptions=0} }}]
22/07/22 00:55:55 INFO HoodieTableConfig: Loading dataset properties from s3a://podsofaupgradetesting-v2/HZ_CUST_ACCOUNTS/.hoodie/hoodie.properties
22/07/22 00:55:55 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE from s3a://podsofaupgradetesting-v2/HZ_CUST_ACCOUNTS
22/07/22 00:55:56 INFO HoodieActiveTimeline: Loaded instants java.util.stream.ReferencePipeline$Head@5af850f1
22/07/22 00:55:56 INFO HoodieTableFileSystemView: Adding file-groups for partition :2022/05/04, #FileGroups=3
22/07/22 00:55:56 INFO AbstractTableFileSystemView: addFilesToView: NumFiles=4, FileGroupsCreationTime=15, StoreTimeTaken=0
22/07/22 00:55:56 INFO HoodieROTablePathFilter: Based on hoodie metadata from base path: s3a://podsofaupgradetesting-v2/HZ_CUST_ACCOUNTS, caching 3 files under s3a://podsofaupgradetesting-v2/HZ_CUST_ACCOUNTS/2022/05/04
hoodieIncrementalView: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, _hoodie_commit_seqno: string ... 143 more fields]

The same set of commands through Hudi 0.9.0 take ~15 seconds. Note that Spark is spinning up 1500 tasks, which is why I think partition pruning is not happening.


scala> val hudiDirectory = "s3a://podsofaupgradetesting-v2/HZ_CUST_ACCOUNTS/2022/05/04/"
hudiDirectory: String = s3a://podsofaupgradetesting-v2/HZ_CUST_ACCOUNTS/2022/05/04/

scala> val hoodieIncrementalView = spark.read.format("org.apache.hudi").load(hudiDirectory)
[Stage 27:>                                                         (0 + 0) / 1]


scala> val hoodieIncrementalView = spark.read.format("org.apache.hudi").load(hudiDirectory)
[Stage 29:=====================================================>(221 + 3) / 224]


scala> val hoodieIncrementalView = spark.read.format("org.apache.hudi").load(hudiDirectory)
[Stage 30:=======================================>          (1191 + 176) / 1500]

scala> val hoodieIncrementalView = spark.read.format("org.apache.hudi").load(hudiDirectory)
hoodieIncrementalView: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, _hoodie_commit_seqno: string ... 143 more fields]

Initially, I was getting a WARN message WARN HoodieFileIndex: No partition columns available from hoodie.properties. Partition pruning will not work. I added the configuration hoodie.table.partition.fields=partition_date to the table's hoodie.properties. The WARN disappeared after that, but it still takes ~15 seconds to read a single partition.

I tried to upgrade the hudi table through Hudi cli using upgrade table --toVersion TWO however that is running into problems similar like https://github.com/apache/hudi/issues/3894.

I want to understand if the read performance impact can be restored by adding any configuration. I believe irrespective of the table version (in this case version ONE as created through Hudi 0.5.3), there shouldn't be performance issues while reading the table through Hudi 0.9.0

Content of the table's hoodie.properties :

hoodie.table.name=HZ_CUST_ACCOUNTS
hoodie.archivelog.folder=archived
hoodie.table.type=COPY_ON_WRITE
hoodie.table.partition.fields=partition_date
hoodie.table.precombine.field=integ_key
hoodie.table.recordkey.fields=cust_account_id
hoodie.table.base.file.format=PARQUET

Environment Description

Hudi version : 0.5.3, 0.9.0

Spark version : 2.4.8

Hive version : 2.3.9

Hadoop version : Amazon 2.10.1

Storage (HDFS/S3/GCS..) : s3

Using AWS EMR Cluster 5.34

Please let me know if you need any additional information to troubleshoot this.

tarunguptanit avatar Jul 22 '22 01:07 tarunguptanit

I would suggest to upgrade to version 0.10.1, or if you can wait for a couple of weeks, then 0.12.0 will be out. It's already in RC voting phase. You're right about read performance though. There shouldn't be performance issues while reading the table through Hudi 0.10.1.

codope avatar Aug 02 '22 05:08 codope

@tarunguptanit : did you get a chance to try out 0.12? partition pruning should work as expected.

nsivabalan avatar Sep 12 '22 22:09 nsivabalan

@tarunguptanit would it be possible for you to try out Hudi 0.12?

To explain a little bit what you might be observing:

  • First of all in your case you don't rely on partition-pruning, instead you're directly reading one of the partitions by providing tha sub-path w/in the table. While it's somewhat similar to partition pruning behavior this distinctly different mechanism in terms of implementation (inside both Spark and Hudi)
  • The reason why you see 1500 tasks being spin'd is b/c even though you're reading one particular partition Hudi currently will be doing file-listing of the whole table (file-listing means that we will just list the files in the table, but we won't be reading the whole table). This is a known issue and there's an effort underway to revisit that.

alexeykudinkin avatar Sep 13 '22 00:09 alexeykudinkin

Yes, I was able to fix this issue by upgrading to 0.10. Seems like this way of reading the specific partition by providing the actual path is not supported with the newer versions of Hudi :

scala> val hudiDirectory = "s3a://podsofaupgradetesting-v2/HZ_CUST_ACCOUNTS/2022/05/04/"

I had to url encode the partition path for my table by using the parameter hoodie.datasource.write.partitionpath.urlencode and then use Spark filter function to do partition pruning.

Something like this :

val hoodieIncrementalView = spark.read.format("hudi").load(hudiDirectory).filter(col("partition_date") === "2022/05")

This fixed my issue. I revisited the documentation but didn't see this change in behaviour noted. Not sure if I missed something, but it would be good to call this out.

tarunguptanit avatar Sep 13 '22 00:09 tarunguptanit