handling remote files (or setting rasterio/gdal context)
Hi,
thanks for this great package. I'm wondering if it's possible to pass an open raster (as opposed to simply the path) to zonalstats (without converting to ndarray first).
My use case is that I'm reading a raster file from a public s3 bucket, which requires the following rasterio context:
with rasterio.env.Env(aws_unsigned=True):
r = rasterio.open("s3://path-to-raster.tif)
which works great by itself. But now I need to calculate zonal stats and I cannot make rs.zonal_stats use the rasterio environment. Would it be possible to modify the rs.zonal_stats so that it accepts an open rasterio.DatasetReader? (or allow passing the context)
e.g.
with rasterio.env.Env(aws_unsigned=True):
open_raster = rasterio.open("s3://path-to-raster)
zonal_gjson = rs.zonal_stats(
geodataframe, open_raster, prefix="Type_", geojson_out=True, categorical=True
)
I've been doing s3 stuff with rasterio and it handles an s3 path without the need for a context mananger, as long as it's a free bucket. rasterstats doesn't change the path at all so it seems to work fine without any adjustment
import rasterstats
from shapely.geometry import Polygon
s3_path = 's3://landsat-pds/L8/139/045/LC81390452014295LGN00/LC81390452014295LGN00_B1.TIF'
p=Polygon([(465764.5, 2428342.0), (465104.1, 2404237.9), (492840.3, 2402586.9)])
rasterstats.zonal_stats(p, s3_path)
Out[4]: [{'min': 10150.0, 'max': 13163.0, 'mean': 10490.125291650986, 'count': 372020}]
It can be wrapped in a special rasterio session for a requester pays bucket too. Like with this naip imagery.
from rasterio.session import AWSSession
import rasterio
import rasterstats
from shapely.geometry import Polygon
p=Polygon([(525794.7, 4876393.3), (525711.3, 4875371.7),(526910.1, 4875413.4)])
s3_path = 's3://naip-visualization/wy/2019/60cm/rgb/44104/m_4410459_se_13_060_20190828.tif'
session = AWSSession(requester_pays=True)
with rasterio.Env(session):
stats = rasterstats.zonal_stats(p, s3_path)
stats
[{'min': 59.0, 'max': 220.0, 'mean': 118.68374314241665, 'count': 1696115}]
rasterstats only reads with the rasterio window method, so as long as the s3 files are cloud optimized geotiffs then it will not download the full raster.
For anyone looking to do this in a mock S3 tool like minio, I'll save you a google search:
https://github.com/rasterio/rasterio/issues/1293#issuecomment-502651617
Just add the following variables to your rasterio.Env:
import rasterstats
import rasterio
from shapely.geometry import Polygon
s3_path = 's3://landsat-pds/L8/139/045/LC81390452014295LGN00/LC81390452014295LGN00_B1.TIF'
p = Polygon([(465764.5, 2428342.0), (465104.1, 2404237.9), (492840.3, 2402586.9)])
with rasterio.Env(AWS_HTTPS='NO', GDAL_DISABLE_READDIR_ON_OPEN='YES', AWS_VIRTUAL_HOSTING=False, AWS_S3_ENDPOINT='localhost:9000'):
stats = rasterstats.zonal_stats(p, s3_path)