blockCV icon indicating copy to clipboard operation
blockCV copied to clipboard

cv_spatial - double intersection

Open bfakos opened this issue 2 years ago • 3 comments

Dear Roozbeh, I tried to run cv_spatial(x), where nrow(x) == 37000000, and it does not finish within a day, even if iteration == 1 or selection == "systematic". I wanted to understand the source code, find the bottleneck, and modify the code in a way it run parallel in multiple cores, when I found this issue. The code of cv_spatial() calls sf::st_intersects() twice, even though st_intersects() is a really time-consuming function (and can be easily parallelized).

L254:  sub_blocks <- blocks[x, ]

Here, you run "[.sf", which call st_intersects() and create a logical mask from the results (please refer to L325 in sf.R in package "sf"), but drop the results of the predicate function

L277: sf::st_intersects(sf::st_geometry(x), sf::st_geometry(sub_blocks))

You call again the time-consuming st_intersects(), with almost the same inputs. I recommend calling st_intersects() once, then calculating the logical mask, and finally subsetting the blocks. E.g. something like this:

blocks_intersection <- sf::st_intersects(sf::st_geometry(x), sf::st_geometry(sub_blocks))
mask <- lengths(blocks_intersection) != 0 
sub_blocks <- blocks[mask, ]
blocks_len <- nrow(sub_blocks)
blocks_df <- as.data.frame(blocks_intersection)

Also I suggest parallelizing st_intersects() that could make cv_spatial() much faster. Please let me know if I could collaborate in this development. Have a nice week, Ákos

bfakos avatar Oct 10 '23 13:10 bfakos

Sorry, there were some mistakes, now corrected:

blocks_intersection <- sf::st_intersects(sf::st_geometry(x), sf::st_geometry(blocks))
mask <- lengths(blocks_intersection) != 0 
sub_blocks <- blocks[mask, ]
blocks_len <- nrow(sub_blocks)
blocks_df <- as.data.frame(blocks_intersection[mask])

bfakos avatar Oct 10 '23 13:10 bfakos

Thanks for the report @bfakos I'll check it and get back to you soon

rvalavi avatar Oct 11 '23 11:10 rvalavi

In the meanwhile, I realized that the two st_intersects() has different direction (i.e., x and y are swapped), but sgbp objects can be transposed by t(), which is really fast (implemented in C).

bfakos avatar Oct 11 '23 11:10 bfakos