sf icon indicating copy to clipboard operation
sf copied to clipboard

Speed up st_crs for SpatVector objects

Open atsyplenkov opened this issue 1 year ago • 2 comments

Hey, guys.

Consider updating the st_crs() function for the SpatVector objects. The trick is terra::crs() always returns a character vector, therefore we can skip some checks implemented in st_crs() and we can gain a 2x speed increase in CRS extraction. When there is no CRS in SpatVector object, then the transition happens 1.2 times faster. When it comes to st_as_sf(), then it works 10% faster.

See examples below:

library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE
library(terra)
#> terra 1.7.71

f <- system.file("ex/lux.shp", package="terra")
v <- vect(f)


st_crs2 = function(x) {
  if (!requireNamespace("terra", quietly = TRUE))
    stop("package terra required, please install it first") # nocov
  ctr <- terra::crs(x)
  if (ctr == "") {
    NA_crs_
  } else {
    crs <- sf:::CPL_crs_from_input(ctr)
    crs$input <- crs$Name
    crs
  }
}

bench::mark(
  original = st_crs(v),
  PR = st_crs2(v),
  check = T,
  relative = T,
  max_iterations = 1000L
)
#> # A tibble: 2 × 6
#>   expression   min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <dbl>  <dbl>     <dbl>     <dbl>    <dbl>
#> 1 original    1.93   1.83      1         296.     1.00
#> 2 PR          1      1         1.86        1      1

crs(v) <- NA_character_

bench::mark(
  original = st_crs(v),
  PR = st_crs2(v),
  check = T,
  relative = T,
  max_iterations = 1000L
)
#> # A tibble: 2 × 6
#>   expression   min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <dbl>  <dbl>     <dbl>     <dbl>    <dbl>
#> 1 original    1.24   1.20      1          Inf      NaN
#> 2 PR          1      1         1.11       NaN      NaN

Created on 2024-05-09 with reprex v2.1.0

atsyplenkov avatar May 09 '24 23:05 atsyplenkov

All geographical objects should always declare their CRS. Missing CRS are objects with missing essential data, with no information on their coordinate measurement units or relative shape in relation to other data objects. So helping users abuse their data is not a good idea at all. See https://github.com/r-spatial/sf/blob/97a179a206c8ceacf1a6999b98e3489a5bdd45f4/R/read.R#L546-L549 for a specific case.

rsbivand avatar May 10 '24 05:05 rsbivand

Sure thing! However, I don't understand how my PR differs from the current behavior and philosophy of st_crs.

https://github.com/r-spatial/sf/blob/97a179a206c8ceacf1a6999b98e3489a5bdd45f4/R/crs.R#L73-L76

I am suggesting improving st_crs only for SpatVector objects, which rely on the terra::crs() function, which always returns a character vector.

atsyplenkov avatar May 12 '24 22:05 atsyplenkov

This may work for your tests, but it is unclear whether it will always work. Terra is not very clear what the character string it returns contains, and your else block misses a number of things that st_crs(x) would do.

edzer avatar Jul 23 '24 12:07 edzer