Wing Yew Poon issues

Results 9 issues of


                                            Wing Yew Poon

Core, Spark: Fallback when snapshot does not have schema id

In `SnapshotUtil`, when a snapshot does not have a schema id (written before schema id was added to snapshots), we fall back to reading each of the previous metadata files...

spark

core

Spark: Add custom metric for number of deletes applied by a SparkScan

This is an extension of #4395. Here we add a custom metric for the number of delete rows that have been applied in a scan of a format v2 table....

spark

core

data

[LIVY-771][THRIFT] Do not remove trailing zeros from decimal values.

## What changes were proposed in this pull request? The decimal SQL type is mapped to `java.math.BigDecimal`. Livy removes trailing zeros from the `BigDecimal` before storing its string representation in...

Spark: Add read conf for setting threshold to use streaming delete filter

This enables us to set the threshold to a low number (2), to exercise the streaming filter code path when counting number of positional deletes applied. This is a follow...

spark

data

API: Fix estimated row count in ContentScanTask

#5720 added `estimatedRowsCount` to `ScanTask` and provided an implementation in `ContentScanTask`. For a `ScanTask` that scans the entire data file, we can do better for this estimate, and return the...

API

spark

Reading snapshot of table uses current schema

I am new to Iceberg. When I do ``` val df = spark.read().format(“iceberg”).option(“snapshot-id”, snapshotId).load(path) ``` where ```spark``` is a ```SparkSession```, ```df``` has the current schema of the table, as can...

stale

Support changelog scan for table with delete files

Currently changelog scan is only supported for a table with no delete files. We implement support for the case when delete files are present in the snapshots to be scanned.

spark

core

data

build

Arrow: Fix indexing in Parquet dictionary encoded values readers

This fixes https://github.com/apache/iceberg/issues/11221. There is a bug in `VectorizedDictionaryEncodedParquetValuesReader.BaseDictEncodedReader::nextBatch` where `nextVal` of the `BaseDictEncodedReader` subclass is called with the incorrect index for certain subclasses (in particular, for `FixedSizeBinaryDictEncodedReader`), leading to...

spark

arrow

Build and test hive-metastore with Hive 2, 3 and 4 with a single source set

We define new modules, hive3-metastore and hive4-metastore, that depend on Hive 3.1.3 and Hive 4.0.1 respectively for their Hive dependencies. The existing hive-metastore module continues to depend on Hive 2.3.10....

build

hive