sedona
sedona copied to clipboard
[Sedona Core] Sedona doesn't clip raster to geometry extent in zonal statistics which can lead to inefficient queries
In the Sedona core raster function, which is used in the zonal statistics
private static List<Object> getStatObjects(GridCoverage2D raster, Geometry roi, int band, boolean allTouched, boolean excludeNoData, boolean lenient)
We loop through all elements of the raster data array rather than clipping it to the geometry's boundary. This leads to long processing runs, especially when the geometry's area is much smaller than the raster's size.
Example of zonal stats for a relatively small polygon in comparison to the raster
exploding the rasters before calculating zonal stats, like below
.selectExpr("rp", "Explode(RS_Tile(rast, 64, 64)) AS col")
Improves the speed of the processing a lot
Clipping before zonal stats is an easy improvement we can add.
cc: @jiayuasu