Cache not working correctly when resampling MODIS granule

Open simonrp84 opened this issue 5 years ago • 1 comments

Code Sample, a minimal, complete, and verifiable piece of code


from satpy import Scene
scn = Scene([inf], reader='modis_l1b')
scn.load(['27', '29', '31', '32', '33'], resolution=1000)
scn2 = scn.resample('laea_bb', 
                    radius_of_influence=15000,
                    resampler='bilinear',
                    cache_dir='./CACHE/')

Where inf is a MODIS L1b granule. I'm using MOD021KM.A2006091.1605.061.2017263021720.hdf

Problem description

When resampling a MODIS granule onto the laea_bb area def (which is defined based on the granule itself, rather than a predefined extent) the cache_dir seems not to work. Each time I run the above code a new set of coefficients are generated in the ./CACHE/ dir rather than using the existing ones. I expect this is due to the dynamic nature of the area definition. Maybe something with floating point precision in the extent?

Versions of Python, package at hand and relevant dependencies

Satpy: 0.22.0 Pyresample: 1.16.0

Jun 20 '20 09:06 simonrp84

I think this is a bug specifically with bilinear resampling. To be clear though, I'm 99% sure SwathDefinition-based input data is not cacheable. Bilinear resampling should be warning you of this and then not caching anything. If you try using the nearest resampler do you see things being cached?

The reason the cached files are being re-generated every time is because Satpy chooses not to hash the lon/lat dask arrays (to generate a cache key) and instead uses the .name of the dask array. Hashing the full array would require loading all of the lon/lat data which we wanted to avoid. This .name of the dask array is randomly generated when the array is created and with every operation performed. So when you rerun things a new .name for the dask array is being generated and it will never equal what has already been created in your cache.

Jun 29 '20 18:06 djhoese