iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Spark:read iceberg table data error

Open beyond-up opened this issue 1 year ago • 6 comments

Apache Iceberg version

1.5.2

Query engine

Spark

Please describe the bug 🐞

When I used iceberg-spark-runtime-3.3_2.12-1.5.2.jar to query the iceberg table data, an error was reported. The error message showed that there were null values, but there was no null value data in the table. image

image

Willingness to contribute

  • [ ] I can contribute a fix for this bug independently
  • [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • [ ] I cannot contribute a fix for this bug at this time

beyond-up avatar Oct 17 '24 03:10 beyond-up

@beyond-up can you share the full stack trace please? Usually there's some more info in other parts of the stack trace that show what went wrong

nastra avatar Oct 17 '24 06:10 nastra

@beyond-up can you share the full stack trace please? Usually there's some more info in other parts of the stack trace that show what went wrong

I have found the cause of this problem. It is because there are '' in the data field in the table. However, I am surprised that '' in a String type field can cause an NPE error! @nastra

beyond-up avatar Oct 17 '24 08:10 beyond-up

@beyond-up so far the NPE seems to be coming from Spark itself, not from Iceberg. Do you have a small reproducible example?

nastra avatar Oct 17 '24 11:10 nastra

Which exact Spark version are you using? A similar issue was reported in https://issues.apache.org/jira/browse/SPARK-39061 and was already fixed in Spark 3.3.1

nastra avatar Oct 17 '24 11:10 nastra

@beyond-up so far the NPE seems to be coming from Spark itself, not from Iceberg. Do you have a small reproducible example?

This problem will be reproduced when a string type field in the table is all '' ; My Spark version is 3.5 and I used iceberg-spark-runtime-3.5_2.13.jar @nastra

beyond-up avatar Oct 21 '24 02:10 beyond-up

@beyond-up in that case you might want to use a more recent Spark version that includes a potential fix for this

nastra avatar Oct 21 '24 05:10 nastra

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Apr 20 '25 00:04 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar May 04 '25 00:05 github-actions[bot]