iceberg-rust feat(core): Add support for `

Which issue does this PR close?

Closes #1766.

What changes are included in this PR?

Integrates virtual field handling for the _file metadata column into RecordBatchTransformer using a pre-computed constants map, eliminating post-processing and duplicate lookups.

Key Changes

New metadata_columns.rs module: Centralized utilities for metadata columns

Constants: RESERVED_FIELD_ID_FILE, RESERVED_COL_NAME_FILE
Helper functions: get_metadata_column_name(), get_metadata_field_id(), is_metadata_field(), is_metadata_column_name()

Enhanced RecordBatchTransformer:

Added constant_fields: HashMap<i32, (DataType, PrimitiveLiteral)> - pre-computed during initialization
New with_constant() method - computes Arrow type once during setup
Updated to use pre-computed types and values (avoids duplicate lookups)
Handles DataType::RunEndEncoded for constant strings (memory efficient)

Simplified reader.rs:

Pass full project_field_ids (including virtual) to RecordBatchTransformer
Single with_constant() call to register _file column
Removed post-processing loop

Updated scan/mod.rs:

Use is_metadata_column_name() and get_metadata_field_id() instead of hardcoded checks

Are these changes tested?

Yes, comprehensive tests have been added to verify the functionality:

New Tests (7 tests added)

Table Scan API Tests (7 tests)

test_select_with_file_column - Verifies basic functionality of selecting _file with regular columns
test_select_file_column_position - Verifies column ordering is preserved
test_select_file_column_only - Tests selecting only the _file column
test_file_column_with_multiple_files - Tests multiple data files scenario
test_file_column_at_start - Tests _file at position 0
test_file_column_at_end - Tests _file at the last position
test_select_with_repeated_column_names - Tests repeated column selection

Nov 04 '25 09:11 gbrgr

Thanks @gbrgr for this pr. But I think we need to rethink how to compute the _file, _pos metadata column. While it's somehow trivial to compute _file, it's non trivial to compute _pos efficient, since when we read parquet files, we have filtered out some row groups. I think the best way is to push reading these two columns to arrow-rs.

@liurenjie1024 I agree for _pos, and we have a PR there: https://github.com/apache/arrow-rs/pull/8715 But _file seems like something that we don't need the arrow-rs to know about. Similarly, in future, for _row_id from V3 spec, we cannot expect arrow-rs to be responsible for computing that one.

How do we go forward with rethinking this, what would be the action items for us?

Nov 06 '25 10:11 vustef

Thanks @gbrgr for this pr. But I think we need to rethink how to compute the _file, _pos metadata column. While it's somehow trivial to compute _file, it's non trivial to compute _pos efficient, since when we read parquet files, we have filtered out some row groups. I think the best way is to push reading these two columns to arrow-rs.

@liurenjie1024 I agree for _pos, and we have a PR there: apache/arrow-rs#8715 But _file seems like something that we don't need the arrow-rs to know about. Similarly, in future, for _row_id from V3 spec, we cannot expect arrow-rs to be responsible for computing that one.

How do we go forward with rethinking this, what would be the action items for us?

Hi, @vustef I also agree that we should put _file in iceberg-rust, and I left some comments about how to proceed.

Nov 10 '25 01:11 liurenjie1024

@liurenjie1024 I now resolved the merge conflicts that stem from PR #1821:

I removed partition information stored in the RecordBatchTransformer, and instead generically store constant fields in it which can be used to add column sources.
I changed all added columns to be REE encoded, as they are all constant values.

Nov 14 '25 08:11 gbrgr

@vustef I changed the PR in the sense that default values are not REE encoded anymore, but only constant fields that come from added metadata fields + partition data.

@liurenjie1024 let us know whether that is OK. If REE is desired in general for all constant columns, I guess it is better to make a follow-up PR to keep changesets smaller.

Nov 17 '25 12:11 gbrgr

@liurenjie1024 just a friendly ping on this for the new round of your feedback, if you have time.

Nov 24 '25 13:11 vustef

Hi, @gbrgr are you stilling working on this? This is a blocker of 0.8.0 release.

Dec 09 '25 03:12 liurenjie1024

@liurenjie1024 yes I will incorporate the feedback today

Dec 09 '25 05:12 gbrgr

feat(core): Add support for `_file` column

Which issue does this PR close?

What changes are included in this PR?

Key Changes

Are these changes tested?

New Tests (7 tests added)

Table Scan API Tests (7 tests)