Geometries for some SA1s are very badly mangled
I've noticed for a handful of SA1s in the dataset sa12021 that the polygon in the geometry column is very wrong.
One example is SA1 80109110916. Below is a map showing this SA1 as it appears in the sa12021 dataset (it's the tiny red sliver in the middle of the plot).
If I instead use the SA1 shapefiles published by the ABS, the map looks like this:
The SA1s affected by this issue can be identified by comparing the area of the SA1 as computed by sf::st_area with the area given in the variable areasqkm_2021. Reprex:
library(absmapsdata)
library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.8.1, PROJ 9.3.1; sf_use_s2() is TRUE
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(units)
#> udunits database from /opt/homebrew/lib/R/4.3/site-library/units/share/udunits/udunits2.xml
sa12021 |>
filter(sa1_code_2021 == "80109110916") |>
mutate(area_sf = set_units(st_area(geometry),"km^2")) |>
select(area_sf,areasqkm_2021)
#> Simple feature collection with 1 feature and 2 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: 149.0832 ymin: -35.33873 xmax: 149.0843 ymax: -35.33796
#> Geodetic CRS: WGS 84
#> area_sf areasqkm_2021 geometry
#> 1 0.0003975106 [km^2] 0.0099 MULTIPOLYGON (((149.0832 -3...
Created on 2024-03-31 with reprex v2.1.0
I'm guessing this is caused by how the geometries are simplified before being stored in this package, but is it possible to do this in a way that doesn't completely destroy some of the polygons?
thanks for raising this.
yes, this is an issue with the simplification level and its interactions with your example SA1 and its neighbouring SA1s.
if i had my time again, i would have better balanced the simplification level for SA1s, or not simplified them at all.
maybe a future update could use rmapshaper::ms_simplify(keep = 0.3) rather than rmapshaper::ms_simplify(keep = 0.1). this increases the file size but has with far better results, eg:
may also be worth hosting the data objects somewhere with greater storage capacity to allow for unsimplified sf objects to be hosted.
i don't have time to explore alternatives at the moment, but keen to hear if there are any better ideas out there.
thanks for raising this.
yes, this is an issue with the simplification level and its interactions with your example SA1 and its neighbouring SA1s.
if i had my time again, i would have better balanced the simplification level for SA1s, or not simplified them at all. maybe a future update could use
rmapshaper::ms_simplify(keep = 0.3)rather thanrmapshaper::ms_simplify(keep = 0.1). this increases the file size but has with far better results, eg:
may also be worth hosting the data objects somewhere with greater storage capacity to allow for unsimplified
sfobjects to be hosted.i don't have time to explore alternatives at the moment, but keen to hear if there are any better ideas out there.
I think finding a way to host unsimplified objects would be the best long-term solution (maybe as geoparquet files which allow for fast reading and reasonably compact storage)
agree. something i can explore down the line (and happy to take PRs from anyone reading this!)