Wing Yew Poon

Results 9 issues of Wing Yew Poon

In `SnapshotUtil`, when a snapshot does not have a schema id (written before schema id was added to snapshots), we fall back to reading each of the previous metadata files...

spark
core

This is an extension of #4395. Here we add a custom metric for the number of delete rows that have been applied in a scan of a format v2 table....

spark
core
data

## What changes were proposed in this pull request? The decimal SQL type is mapped to `java.math.BigDecimal`. Livy removes trailing zeros from the `BigDecimal` before storing its string representation in...

This enables us to set the threshold to a low number (2), to exercise the streaming filter code path when counting number of positional deletes applied. This is a follow...

spark
data

#5720 added `estimatedRowsCount` to `ScanTask` and provided an implementation in `ContentScanTask`. For a `ScanTask` that scans the entire data file, we can do better for this estimate, and return the...

API
spark

I am new to Iceberg. When I do ``` val df = spark.read().format(“iceberg”).option(“snapshot-id”, snapshotId).load(path) ``` where ```spark``` is a ```SparkSession```, ```df``` has the current schema of the table, as can...

stale

Currently changelog scan is only supported for a table with no delete files. We implement support for the case when delete files are present in the snapshots to be scanned.

spark
core
data
build

This fixes https://github.com/apache/iceberg/issues/11221. There is a bug in `VectorizedDictionaryEncodedParquetValuesReader.BaseDictEncodedReader::nextBatch` where `nextVal` of the `BaseDictEncodedReader` subclass is called with the incorrect index for certain subclasses (in particular, for `FixedSizeBinaryDictEncodedReader`), leading to...

spark
arrow

We define new modules, hive3-metastore and hive4-metastore, that depend on Hive 3.1.3 and Hive 4.0.1 respectively for their Hive dependencies. The existing hive-metastore module continues to depend on Hive 2.3.10....

build
hive