iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Spec is ambiguous w.r.t. optional fields in field_summary

Open JFinis opened this issue 2 years ago • 1 comments

Apache Iceberg version

1.4.3 (latest release)

Query engine

None, it's a spec issue.

Please describe the bug 🐞

I'm referring to the definition of field_summary, which is as follows:

optional | optional | 510 lower_bound | bytes [1] | Lower bound for the non-null, non-NaN values in the partition field, or null if all values are null or NaN [2] optional | optional | 511 upper_bound | bytes [1] | Upper bound for the non-null, non-NaN values in the partition field, or null if all values are null or NaN [2]

The fields are optional and the semantics of optional fields is that any writer may decide to not write them for whatever reason it pleases. However, for these fields, null has a specific meaning (that there are no non-null, non-NaN values). With the current wording of the spec, a reader cannot rely on this meaning, as a writer could write null either because there are no non-null non-NaN values, or because it chooses not to write this field for other reasons.

Mitigation: The spec should put an asterisk at the optional classification of the field and explain in a foot note that a writer is not allowed to just not write these fields, as null has a specific meaning here. Thus, the fields are something between required and optional. They are actually required with the null value having a specific semantics.

JFinis avatar Feb 16 '24 20:02 JFinis

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Oct 21 '24 00:10 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Nov 05 '24 00:11 github-actions[bot]