hudi
hudi copied to clipboard
Upserts, Deletes And Incremental Processing on Big Data.
### Change Logs This PR cleans up considerable amount of Spark's (internal) resolution logic that has been copied over into Hudi components, while in reality there's no actual need for...
## What is the purpose of the pull request RFC-46 spark specific file reader/writer based on internal row ## Brief change log add spark file reader of parquet/orc/HFile add spark...
## What is the purpose of the pull request This PR is going to support the identifier with a `catalog.database.table` format. For Spark3 that support `catalog`, we can not just...
hudi 0.11.1 spark 3.2.1 I have several hudi tables with > 35k partitions. When running for first time hive sync (meaning adding 35k partition from scratch into glue), I randomly...
hudi 0.11.1 I am working on tables with huge number of partition (> 100k) and almost append only - no update in the past, rarely delete. Previously I had some...
**Problem** In hudi-cli I’m trying to run `repair deduplicate` against a partition in which I have confirmed via a separate spark query that there are in fact duplicates on the...
### Change Logs when enable EmbeddedTimelineServerReuse and disable EmbeddedTimelineServer, TimelineBasedMarkers will fall back to DirectWriteMarkers, Flink default enable EmbeddedTimelineServerReuse and disable EmbeddedTimelineServer, It cause TimelineBasedMarkers can't work ### Impact none...
**Describe the problem you faced** After the upgrade to Hudi 0.10, I faced the https://github.com/apache/hudi/issues/4283 issue in my environment, so my AWS Glue tables were working fine on AWS Athena,...
Starting of a feature that aims to replicate on GCS the reliable ingestion of data from AWS S3 buckets (https://hudi.apache.org/blog/2021/08/23/s3-events-source). Compare with equivalent code in S3EventsSource: https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/S3EventsSource.java#L38-L44 Currently tested like...
### Change Logs Before that, the code 'FlinkStreamerConfig.toFlinkConfig(cfg)' will be called twice. It's mostly misleading and the same thing both times. ### Impact _Describe any public API or user-facing feature...