Setting grouping criteria per group using spdep::skater()
@rsbivand, is it possible to set grouping criteria per group using spdep::skater()? So, suppose I want to cluster geometries into 3 groups such that the groups have the following population: [1000, 2000), [2000, 5000), and [5000, Inf). Is it doable with the crit argument? GeoDaCenter/rgeoda#44
Please provide a minimal reproducible code example using built-in data.
@rsbivand, here's my attempt:
## Borrowed from spdep package vignette
### loading data
bh <- sf::st_read(
system.file("etc/shapes/bhicv.shp", package = "spdep")[1], quiet = TRUE
)
sf::st_crs(bh) <- "OGC:CRS84"
### data standardized
dpad <- data.frame(scale(as.data.frame(bh)[, 5:8]))
### neighborhood list
bh.nb <- spdep::poly2nb(bh)
### calculating costs
lcosts <- spdep::nbcosts(bh.nb, dpad)
### making listw
nb.w <- spdep::nb2listw(bh.nb, lcosts, style = "B")
### find a minimum spanning tree
mst.bh <- spdep::mstree(nb.w, 5)
### three groups with no restriction
res1 <- spdep::skater(mst.bh[, 1:2], dpad, 2)
## Introducing constraints (my part)
### works fine
res_min_pop_50k <- spdep::skater(
mst.bh[, 1:2], dpad, 2,
crit = 50000, vec.crit = bh$Population
)
### throws error
res_pop_breaks <- spdep::skater(
mst.bh[, 1:2], dpad, 2,
crit = c(-Inf, 250000, 500000, Inf), # population: [min, 250k), [250k, 500k), and [500k, max]
vec.crit = bh$Population
)
My understanding from ?skater and http://www.dpi.inpe.br/gilberto/papers/assuncao_neves_camara_ijgis.pdf (cross-check @eliaskrainski ) is that crit is either a scalar or a two element vector https://github.com/r-spatial/spdep/blob/b815397c2ce0d58b3b4ca0c36bba7b5c3abbbcc2/R/skater.R#L21-L24 vec.crit will be a unit vector https://github.com/r-spatial/spdep/blob/b815397c2ce0d58b3b4ca0c36bba7b5c3abbbcc2/R/skater.R#L25-L29 if crit is a scalar or two element vector and crit is to be understood as a minimum or (min, max) count of observations in each group. If crit is scalar and vec.crit is not a unit vector but observation population counts, crit is the minimum group population.
https://github.com/r-spatial/spdep/blob/b815397c2ce0d58b3b4ca0c36bba7b5c3abbbcc2/R/skater.R#L65-L66 shows the counting by group of vec.crit in the iterations, if unit vector then counts of observations, or population counts. Using crit with a length > 1 when vec.crit is not the unit vector perhaps should error exit (I don't see any errors, only unhelpful output objects.
From the examples, it seems as though repeated application of skater to an object returned by the function may permit the updating of the object, but I fail to see what a vector of group population counts would mean in practice. Possibly we should add short comments in the documentation and perhaps a warning in the function code if length(crit) > 1L and vect.crit is not the unit vector.
You can combine a discrete variable with a few levels with the groups from the clustering output.
library(spdep)
## a map
map <- sf::st_read(system.file(
"shapes/sids.shp", package="spData")[1])
### neighboorhod list
nbl <- poly2nb(map)
### a variable: estimate of the rate
ni <- map$BIR74 + map$BIR79
yi <- map$SID74 + map$SID79
map$raw <- yi/ni
map$est <- EBlocal(
ri = yi,
ni = ni,
nb = nbl)$est
summary(map$est)
## standardize
dpad <- scale(map$est)
### calculating costs
lcosts <- nbcosts(nbl, dpad)
### making listw
nbw <- nb2listw(nbl, lcosts, style="B")
### find a minimum spanning tree
mst <- mstree(nbw, 5)
### cluster
sk <- skater(
edges = mst,
data = dpad,
ncuts = 3
)
table(sk$groups)
tapply(ni, sk$groups, sum)
### subdivide the groups by ni
summary(ni)
table(b3 <- findInterval(ni, c(3000, 7000)))
### combine the results
newc <- paste0("B", b3, "g", sk$groups)
table(newc, b3)
### add results to the map to plot
map$group1 <- paste0("g", sk$groups)
map$group2 <- newc
library(ggplot2)
library(ggpubr)
gg0 <- ggplot(data = map) + theme_minimal()
ggarrange(gg0 + geom_sf(aes(fill = raw)),
gg0 + geom_sf(aes(fill = est)),
gg0 + geom_sf(aes(fill = group1)),
gg0 + geom_sf(aes(fill = group2)))
Thanks, @rsbivand & @eliaskrainski! I will give it a shot!