iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Reading snapshot of table uses current schema

Open wypoon opened this issue 5 years ago • 2 comments

I am new to Iceberg. When I do

val df = spark.read().format(“iceberg”).option(“snapshot-id”, snapshotId).load(path)

where spark is a SparkSession, df has the current schema of the table, as can be seen when an action is performed that causes df to be evaluated, such as

df.show()

Is this the expected behavior? In my case, I tried altering the table, either adding a column or removing a column, and then trying to read an old snapshot before the table was altered, and I was expecting to get the table as it existed at the time of the snapshot (with the columns it had then). Is there some conceptual or technical reason why the behavior is the way it is? I have tried out some changes that causes reading the snapshot from Spark to behave the way I expect it to be (using the schema at the time of the snapshot rather than the current schema). I'd be happy to create a PR. Or perhaps we could have different behaviors governed by a flag or option.

wypoon avatar Sep 23 '20 21:09 wypoon

I created a PR: https://github.com/apache/iceberg/pull/1508 but I didn't know how to link it to this issue.

wypoon avatar Sep 25 '20 00:09 wypoon

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Feb 26 '24 00:02 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Mar 12 '24 00:03 github-actions[bot]