raster icon indicating copy to clipboard operation
raster copied to clipboard

The function sampleRandom is not robust against sparce rasters

Open romulogoncalves opened this issue 4 years ago • 0 comments

The sampleRandom function when sampling over a raster layer with many NaNs (a sparse raster layer) has high probability to not return the required number of samples even if the number of samples is less than the number of cells with non NaN values.

The finding was done while reading the sampleRandom function code. On line 128, in case we want to omit NaNs, the sampleRandom samples 4 times the size of the requested sample, and then only on line 147 omits the NaNs. Then on line 151 it chops the result in case the number of samples is higher than the requested one.

sampleRandom code snippet:

			if (na.rm) {
128				N <- 4 * size 
			} else {
				N <- size 
			}	
			
			N <- min(N, nc)
			rcells <- sampleInt(nc, N)
			
			if (!is.null(ext)) {
				XY <- xyFromCell(xx, rcells)
				rcells <- cellFromXY(r, XY)
			}
			
			x <- .cellValues(x, rcells)
			if (cells) {
				x <- cbind(cell=rcells, value=x)
			}
			
			if (na.rm) {
147				x <- stats::na.omit(x)
				if (is.matrix(x)) {
					d <- dim(x)
					x <- matrix(as.vector(x), d[1], d[2])
151					if (nrow(x) > size) {
						x <- x[1:size, ]
					}
				} else {
					x <- as.vector(x)
					if ( length(x) > size ) {
						x <- x[1:size]
					}
				}
			}	
		}

Would not be better to first remove the NaNs and then do the sampling? Or was there a special reason, such as better distribution of the samples, to follow the current approach? In case I miss-understood the code, my apologizes and please let me know.

romulogoncalves avatar Jan 21 '22 14:01 romulogoncalves