iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Spark Action to Analyze table

Open karuppayya opened this issue 1 year ago • 3 comments

This change adds a Spark action to Analyze tables. As part of analysis, the action generates Apache data - sketch for NDV stats and writes it as puffins.

karuppayya avatar May 08 '24 05:05 karuppayya

cc: @RussellSpitzer @aokolnychyi @huaxingao @findepi

karuppayya avatar May 08 '24 05:05 karuppayya

there was an old PR on the same: https://github.com/apache/iceberg/pull/6582

ajantha-bhat avatar May 15 '24 10:05 ajantha-bhat

there was an old PR on the same: https://github.com/apache/iceberg/pull/6582

I don't have time to work on this, so karuppayya will take over. Thanks a lot @karuppayya for continuing the work.

huaxingao avatar May 15 '24 15:05 huaxingao

I'll have some time to take a look this week.

aokolnychyi avatar Jul 02 '24 23:07 aokolnychyi

Hi @karuppayya is there any more changes that are supposed to be added to this PR? If no, when are we planning to get it merged?

Apart from that is there any plan to back port these changes into older versions like 3.4 or 3.3. Actually we have a requirement for 3.4 and 3.3 to add this feature.

jeesou avatar Jul 25 '24 06:07 jeesou

Great work, @karuppayya! Thanks everyone for reviewing!

aokolnychyi avatar Aug 22 '24 03:08 aokolnychyi

Thanks @aokolnychyi and everyone for the reviews.

karuppayya avatar Aug 22 '24 04:08 karuppayya

@karuppayya sorry to bother. I am able to get the action and called from Java liek this:

actions.computeTableStats(table).columns(columns.toArray(new String[0])).execute();

What what will be the Spark SQL / Pypsark ways to use this.

sfc-gh-mrojas avatar Sep 19 '24 13:09 sfc-gh-mrojas

@sfc-gh-mrojas I have PR open for introducing a procedure to invoke the action.

karuppayya avatar Sep 19 '24 16:09 karuppayya