iceberg-rust icon indicating copy to clipboard operation
iceberg-rust copied to clipboard

[EPIC] feat: support metadata tables

Open xxchan opened this issue 1 year ago • 13 comments

In Iceberg Spark/Flink there are metadata tables that provide information around the table: https://iceberg.apache.org/docs/latest/spark-queries/#inspecting-tables

Supporting this in iceberg-rust allows other engines (like RisingWave) to support these "standard" metadata tables.

reference implementation: - https://py.iceberg.apache.org/api/#inspecting-tables - https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/inspect.py#L58-L90

List of all metadata tables: https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/MetadataTableType.java#L23

  • ENTRIES #863
  • FILES
  • DATA_FILES
  • DELETE_FILES
  • HISTORY #841
  • METADATA_LOG_ENTRIES #846
  • SNAPSHOTS #822
  • REFS
  • MANIFESTS #861
  • PARTITIONS
  • ALL_DATA_FILES
  • ALL_DELETE_FILES
  • ALL_FILES
  • ALL_MANIFESTS
  • ALL_ENTRIES
  • POSITION_DELETES

xxchan avatar Dec 18 '24 13:12 xxchan

Thanks for working on this @xxchan. Metadata tables are very important when it comes to the metadata operations. For example, the snapshots table in https://github.com/apache/iceberg-rust/pull/822 will make it easy to perform operations like expiring snapshots.

Fokko avatar Dec 18 '24 18:12 Fokko

i'd like to contribute to some of the implementation of these metadata tables, similar to the work done by @rshkv based on @xxchan's branch.

I plan to start with the MANIFESTS table first. could you kindly let me know if anyone else is currently working on this or has plans to do so? I want to avoid any potential conflicts. thanks.

flaneur2020 avatar Dec 28 '24 15:12 flaneur2020

~@flaneur2020, apologies, I was actually just working on manifests and I should be able to put that up in a few days. If you want to go ahead with entries or files there'll be some overlap and code to reuse but that's fine - happy to rebase.~

Ignore the above. I'm looking at entries now. Not working on manifests.

rshkv avatar Dec 28 '24 15:12 rshkv

I'd like to attempt the FILES metadata table if not already being worked on. Thanks!

DeaconDesperado avatar Jan 22 '25 13:01 DeaconDesperado

I'd like to attempt the FILES metadata table if not already being worked on. Thanks!

Welcome to contribute!

liurenjie1024 avatar Jan 24 '25 09:01 liurenjie1024

Hi, I'm going to have a look at the PARTITIONS table:)

felixscherz avatar Feb 01 '25 22:02 felixscherz

Hi, feel free to take subtasks. But kindly reminder that we need to resolve https://github.com/apache/iceberg-rust/issues/868 before proceeding

xxchan avatar Feb 02 '25 02:02 xxchan

I can take up the Refs table if nobody is working on it!

geruh avatar May 19 '25 21:05 geruh

@geruh https://github.com/apache/iceberg-rust/pull/863 there is quite a big blocker here, if you would like to review, that would be great!

jonathanc-n avatar May 19 '25 21:05 jonathanc-n

@xxchan @jonathanc-n @geruh

I also want to help out here. I was thinking how to make it easy for us to end to end test the metadata tables as we add support for more metadata tables.

I have a local PoC that extends the data fusion integration in this package to include metadata tables, so it can be used to query already supported metadata tables via SQL:

SELECT * from catalog.namespace.<Table>.snapshots; // OR Manifests as that is the other support metadata table

Do you folks agree that we should extended data-fusion integration to support them? If yes then I will clean up code and send a PR for that. Thank you!

Also cut a feature request to track the data fusion integration work: https://github.com/apache/iceberg-rust/issues/1365

UPDATE: I see there is a open PR already, so I have closed the new Issue :)

jagdeeps91 avatar May 22 '25 00:05 jagdeeps91

#879 :p

xxchan avatar May 22 '25 00:05 xxchan

@DeaconDesperado If you're not working on FILES/ALL_FILES, I'd like to take over. Is that okay?

335g avatar Oct 19 '25 08:10 335g

@335g Yes please feel free to take it up! My apologies I wasn't able to get the bandwidth to check back in after the latest updates.

DeaconDesperado avatar Oct 19 '25 10:10 DeaconDesperado