iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Spec inconsistency: partition_spec_id column in ManifestList vs. partition_specs in metadata.json

Open JFinis opened this issue 2 years ago • 1 comments

Apache Iceberg version

Latest

Query engine

None; It's a Spec issue.

Please describe the bug 🐞

The spec is inconsistent with respect to the partition_spec_id column. Here the spec notes:

required | required | 502 partition_spec_id | int | ID of a partition spec used to write the manifest; must be listed in table metadata partition-specs

Note that this column is required in both v1 and v2.

Now, let's have a look at the definition of "table metadata partition-specs":

optional | required | partition-specs | A list of partition specs, stored as full partition spec objects.

As we see, partition-specs is actually optional in v1. But given that partition_spec_id is required in v1 and its contents must be listed in the table's partition-specs, there is no way that partition-specs can be optional. If it was missing, the Iceberg would be ill-formed, as there would be no way for the values in partition_spec_id to refer to the missing partition-specs. So something is off here.

If I see it correctly, there are two remediations:

  • Define partition-specs to be required in v1. Any v1 Iceberg without this field would then be ill-formed.
  • Loosen the requirement that the values in partition_spec_id have to refer to specs in partitition-specs. Define what the value is supposed to be (or define that any arbitrary values are allowed) in case of partition-specs being absent.

JFinis avatar Feb 16 '24 18:02 JFinis

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Oct 21 '24 00:10 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Nov 05 '24 00:11 github-actions[bot]