hudi icon indicating copy to clipboard operation
hudi copied to clipboard

Upserts, Deletes And Incremental Processing on Big Data.

Results 1006 hudi issues
Sort by recently updated
recently updated
newest added

### Change Logs This PR cleans up considerable amount of Spark's (internal) resolution logic that has been copied over into Hudi components, while in reality there's no actual need for...

priority:major
spark

## What is the purpose of the pull request RFC-46 spark specific file reader/writer based on internal row ## Brief change log add spark file reader of parquet/orc/HFile add spark...

priority:critical
big-needle-movers

## What is the purpose of the pull request This PR is going to support the identifier with a `catalog.database.table` format. For Spark3 that support `catalog`, we can not just...

priority:critical
spark-sql

hudi 0.11.1 spark 3.2.1 I have several hudi tables with > 35k partitions. When running for first time hive sync (meaning adding 35k partition from scratch into glue), I randomly...

meta-sync
aws-support
priority:critical

hudi 0.11.1 I am working on tables with huge number of partition (> 100k) and almost append only - no update in the past, rarely delete. Previously I had some...

**Problem** In hudi-cli I’m trying to run `repair deduplicate` against a partition in which I have confirmed via a separate spark query that there are in fact duplicates on the...

aws-support
priority:critical
spark
cli

### Change Logs when enable EmbeddedTimelineServerReuse and disable EmbeddedTimelineServer, TimelineBasedMarkers will fall back to DirectWriteMarkers, Flink default enable EmbeddedTimelineServerReuse and disable EmbeddedTimelineServer, It cause TimelineBasedMarkers can't work ### Impact none...

**Describe the problem you faced** After the upgrade to Hudi 0.10, I faced the https://github.com/apache/hudi/issues/4283 issue in my environment, so my AWS Glue tables were working fine on AWS Athena,...

aws-support
priority:critical

Starting of a feature that aims to replicate on GCS the reliable ingestion of data from AWS S3 buckets (https://hudi.apache.org/blog/2021/08/23/s3-events-source). Compare with equivalent code in S3EventsSource: https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/S3EventsSource.java#L38-L44 Currently tested like...

### Change Logs Before that, the code 'FlinkStreamerConfig.toFlinkConfig(cfg)' will be called twice. It's mostly misleading and the same thing both times. ### Impact _Describe any public API or user-facing feature...