[EPIC] feat: support metadata tables
In Iceberg Spark/Flink there are metadata tables that provide information around the table: https://iceberg.apache.org/docs/latest/spark-queries/#inspecting-tables
Supporting this in iceberg-rust allows other engines (like RisingWave) to support these "standard" metadata tables.
reference implementation: - https://py.iceberg.apache.org/api/#inspecting-tables - https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/inspect.py#L58-L90
List of all metadata tables: https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/MetadataTableType.java#L23
- ENTRIES #863
- FILES
- DATA_FILES
- DELETE_FILES
- HISTORY #841
- METADATA_LOG_ENTRIES #846
- SNAPSHOTS #822
- REFS
- MANIFESTS #861
- PARTITIONS
- ALL_DATA_FILES
- ALL_DELETE_FILES
- ALL_FILES
- ALL_MANIFESTS
- ALL_ENTRIES
- POSITION_DELETES
Thanks for working on this @xxchan. Metadata tables are very important when it comes to the metadata operations. For example, the snapshots table in https://github.com/apache/iceberg-rust/pull/822 will make it easy to perform operations like expiring snapshots.
i'd like to contribute to some of the implementation of these metadata tables, similar to the work done by @rshkv based on @xxchan's branch.
I plan to start with the MANIFESTS table first. could you kindly let me know if anyone else is currently working on this or has plans to do so? I want to avoid any potential conflicts. thanks.
~@flaneur2020, apologies, I was actually just working on manifests and I should be able to put that up in a few days. If you want to go ahead with entries or files there'll be some overlap and code to reuse but that's fine - happy to rebase.~
Ignore the above. I'm looking at entries now. Not working on manifests.
I'd like to attempt the FILES metadata table if not already being worked on. Thanks!
I'd like to attempt the FILES metadata table if not already being worked on. Thanks!
Welcome to contribute!
Hi, I'm going to have a look at the PARTITIONS table:)
Hi, feel free to take subtasks. But kindly reminder that we need to resolve https://github.com/apache/iceberg-rust/issues/868 before proceeding
I can take up the Refs table if nobody is working on it!
@geruh https://github.com/apache/iceberg-rust/pull/863 there is quite a big blocker here, if you would like to review, that would be great!
@xxchan @jonathanc-n @geruh
I also want to help out here. I was thinking how to make it easy for us to end to end test the metadata tables as we add support for more metadata tables.
I have a local PoC that extends the data fusion integration in this package to include metadata tables, so it can be used to query already supported metadata tables via SQL:
SELECT * from catalog.namespace.<Table>.snapshots; // OR Manifests as that is the other support metadata table
Do you folks agree that we should extended data-fusion integration to support them? If yes then I will clean up code and send a PR for that. Thank you!
Also cut a feature request to track the data fusion integration work: https://github.com/apache/iceberg-rust/issues/1365
UPDATE: I see there is a open PR already, so I have closed the new Issue :)
#879 :p
@DeaconDesperado If you're not working on FILES/ALL_FILES, I'd like to take over. Is that okay?
@335g Yes please feel free to take it up! My apologies I wasn't able to get the bandwidth to check back in after the latest updates.