Spark Action to Analyze table
This change adds a Spark action to Analyze tables. As part of analysis, the action generates Apache data - sketch for NDV stats and writes it as puffins.
cc: @RussellSpitzer @aokolnychyi @huaxingao @findepi
there was an old PR on the same: https://github.com/apache/iceberg/pull/6582
there was an old PR on the same: https://github.com/apache/iceberg/pull/6582
I don't have time to work on this, so karuppayya will take over. Thanks a lot @karuppayya for continuing the work.
I'll have some time to take a look this week.
Hi @karuppayya is there any more changes that are supposed to be added to this PR? If no, when are we planning to get it merged?
Apart from that is there any plan to back port these changes into older versions like 3.4 or 3.3. Actually we have a requirement for 3.4 and 3.3 to add this feature.
Great work, @karuppayya! Thanks everyone for reviewing!
Thanks @aokolnychyi and everyone for the reviews.
@karuppayya sorry to bother. I am able to get the action and called from Java liek this:
actions.computeTableStats(table).columns(columns.toArray(new String[0])).execute();
What what will be the Spark SQL / Pypsark ways to use this.
@sfc-gh-mrojas I have PR open for introducing a procedure to invoke the action.