Sagar Sumit
Sagar Sumit
I think there are no eligible filegroups for clustering. More than the number of records, clustering is a function of file size. What's the avg file size for the data...
@parisni Any update on this issue? If it's still happening, can you please start from scratch syncing to a different database, and provide the sync tool command that you ran?
@p-powell You can check by writing parquet directly. ``` df_id.write.format("parquet").mode(Overwrite).save(parquetBasePath) ``` I tried this and the time taken was comparable. For Hudi, it was 605s and for patquet it was...
Ok. I tried with latest Hudi master. Can you give build from the latest master? I'll try Hudi 0.9. I think latest EMR already has it.
@p-powell Just an update. There were a couple of fixes in the write path to improve performance. Can you please try out the latest master or release(0.11.1)?
The reason for creating copies was to avoid ConcurrentModificationException encountered while long-running deltastreamer jobs. To avoid current modification, table meta client and file system get exclusive copies of the configuration....
I think avoiding copies may not be that trivial. I understand this might slow down queries via presto-hudi connector but I still want to understand the breaking change in detail....
Synced up with @pratyakshsharma regarding this issue. First of all, the issue affects hudi tables queries via presto-hive connector. We need to see if we can use the config provided...
@umehrot2 any updates on HUDI-3056 ?
> yes, makes sense. I will see where we can document this. Can we add it here https://hudi.apache.org/releases/release-0.10.0#writer-side-improvements ?