Calling tibble::rowid_to_column on an object of class "sf" before dplyr::rename will cause " internal error: can't find agr columns"
I reported this on the sf repo where Edzer remarked that it was an issue with rowid_to_column messing up the agr attribute.
So I thought I would report it here.
I use this in my spatial workflow quite often, so I was wondering if it is possible to fix somehow? There are workarounds of course, but if that could be avoided it would be great!
Issue on the sf repo is here
library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
read_sf(system.file("shape/nc.shp", package = "sf")) |>
tibble::rowid_to_column(var = "object_id") |>
dplyr::rename(a_r_e_a = AREA)
#> Error in rename.sf(tibble::rowid_to_column(read_sf(system.file("shape/nc.shp", : internal error: can't find `agr` columns
read_sf(system.file("shape/nc.shp", package = "sf")) |>
# tibble::rowid_to_column(var = "object_id") |>
dplyr::rename(a_r_e_a = AREA)
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS: NAD27
#> # A tibble: 100 × 15
#> a_r_e_a PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74
#> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <dbl>
#> 1 0.114 1.44 1825 1825 Ashe 37009 37009 5 1091 1
#> 2 0.061 1.23 1827 1827 Alleghany 37005 37005 3 487 0
#> 3 0.143 1.63 1828 1828 Surry 37171 37171 86 3188 5
#> 4 0.07 2.97 1831 1831 Currituck 37053 37053 27 508 1
#> 5 0.153 2.21 1832 1832 Northampton 37131 37131 66 1421 9
#> 6 0.097 1.67 1833 1833 Hertford 37091 37091 46 1452 7
#> 7 0.062 1.55 1834 1834 Camden 37029 37029 15 286 0
#> 8 0.091 1.28 1835 1835 Gates 37073 37073 37 420 0
#> 9 0.118 1.42 1836 1836 Warren 37185 37185 93 968 4
#> 10 0.124 1.43 1837 1837 Stokes 37169 37169 85 1612 1
#> # ℹ 90 more rows
#> # ℹ 5 more variables: NWBIR74 <dbl>, BIR79 <dbl>, SID79 <dbl>, NWBIR79 <dbl>,
#> # geometry <MULTIPOLYGON [°]>
Created on 2023-08-15 with reprex v2.0.2
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.3.1 (2023-06-16)
#> os macOS Ventura 13.5
#> system aarch64, darwin20
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Europe/Copenhagen
#> date 2023-08-15
#> pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> class 7.3-22 2023-05-03 [1] CRAN (R 4.3.1)
#> classInt 0.4-9 2023-02-28 [1] CRAN (R 4.3.0)
#> cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0)
#> DBI 1.1.3 2022-06-18 [1] CRAN (R 4.3.0)
#> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.0)
#> dplyr 1.1.2 2023-04-20 [1] CRAN (R 4.3.0)
#> e1071 1.7-13 2023-02-01 [1] CRAN (R 4.3.0)
#> evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.0)
#> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.0)
#> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0)
#> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.0)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0)
#> htmltools 0.5.6 2023-08-10 [1] CRAN (R 4.3.0)
#> KernSmooth 2.23-22 2023-07-10 [1] CRAN (R 4.3.0)
#> knitr 1.43 2023-05-25 [1] CRAN (R 4.3.0)
#> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
#> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
#> proxy 0.4-27 2022-06-09 [1] CRAN (R 4.3.0)
#> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.0)
#> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.0)
#> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.0)
#> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.0)
#> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.3.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
#> Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.0)
#> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.0)
#> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0)
#> rmarkdown 2.23 2023-07-01 [1] CRAN (R 4.3.0)
#> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
#> sf * 1.0-14 2023-07-11 [1] CRAN (R 4.3.0)
#> styler 1.10.1 2023-06-05 [1] CRAN (R 4.3.0)
#> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
#> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0)
#> units 0.8-3 2023-08-10 [1] CRAN (R 4.3.0)
#> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.0)
#> vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.3.0)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.3.0)
#> xfun 0.40 2023-08-09 [1] CRAN (R 4.3.0)
#> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
This seems to be the underlying issue:
options(conflicts.policy = list(warn = FALSE))
library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(dplyr)
input <- read_sf(system.file("shape/nc.shp", package = "sf"))
x1 <-
input |>
dplyr::mutate(var = row_number(), .before = 1)
x2 <-
input |>
tibble::rowid_to_column("var")
waldo::compare(x1, x2)
#> `names(attr(old, 'agr'))[1:4]`: "var" "AREA" "PERIMETER" "CNTY_"
#> `names(attr(new, 'agr'))[1:3]`: "AREA" "PERIMETER" "CNTY_"
#>
#> `attr(old, 'agr')[12:15]`: "NA" "NA" "NA" "NA"
#> `attr(new, 'agr')[12:14]`: "NA" "NA" "NA"
Created on 2023-08-15 with reprex v2.0.2
I don't know about the semantics of the "agr" attribute, and doesn't seem to be too much I can do here. The sf package could override [<- and [[<- for its class to ensure that x2 has the same shape as x1, without changing tibble's implementation.
Thanks, but adding a [<-.sf that takes care of the agr attributes makes sure that here the attributes get set correctly in new_data and out, but here they're being set back to the previous, non-modified version.
Removing the call to vectbl_restore leads to the correct result.
Thanks. I agree that it's not this function's business to call vectbl_restore() .
Over the last years, various variants were tried to preserve the "class" attribute after add_column() . To be fully consistent, we'd need to build this on top of mutate() . What if this function required dplyr for non-data-frame and non-tibble objects?