spdep icon indicating copy to clipboard operation
spdep copied to clipboard

Setting grouping criteria per group using spdep::skater()

Open ashirwad opened this issue 2 years ago • 6 comments

@rsbivand, is it possible to set grouping criteria per group using spdep::skater()? So, suppose I want to cluster geometries into 3 groups such that the groups have the following population: [1000, 2000), [2000, 5000), and [5000, Inf). Is it doable with the crit argument? GeoDaCenter/rgeoda#44

ashirwad avatar Jan 12 '24 19:01 ashirwad

Please provide a minimal reproducible code example using built-in data.

rsbivand avatar Jan 12 '24 19:01 rsbivand

@rsbivand, here's my attempt:

## Borrowed from spdep package vignette
### loading data
bh <- sf::st_read(
  system.file("etc/shapes/bhicv.shp", package = "spdep")[1], quiet = TRUE
)
sf::st_crs(bh) <- "OGC:CRS84"

### data standardized
dpad <- data.frame(scale(as.data.frame(bh)[, 5:8]))

### neighborhood list
bh.nb <- spdep::poly2nb(bh)

### calculating costs
lcosts <- spdep::nbcosts(bh.nb, dpad)

### making listw
nb.w <- spdep::nb2listw(bh.nb, lcosts, style = "B")

### find a minimum spanning tree
mst.bh <- spdep::mstree(nb.w, 5)

### three groups with no restriction
res1 <- spdep::skater(mst.bh[, 1:2], dpad, 2)

## Introducing constraints (my part)
### works fine
res_min_pop_50k <- spdep::skater(
  mst.bh[, 1:2], dpad, 2,
  crit = 50000, vec.crit = bh$Population
)

### throws error
res_pop_breaks <- spdep::skater(
  mst.bh[, 1:2], dpad, 2,
  crit = c(-Inf, 250000, 500000, Inf), # population: [min, 250k), [250k, 500k), and [500k, max]
  vec.crit = bh$Population
)

ashirwad avatar Jan 12 '24 20:01 ashirwad

My understanding from ?skater and http://www.dpi.inpe.br/gilberto/papers/assuncao_neves_camara_ijgis.pdf (cross-check @eliaskrainski ) is that crit is either a scalar or a two element vector https://github.com/r-spatial/spdep/blob/b815397c2ce0d58b3b4ca0c36bba7b5c3abbbcc2/R/skater.R#L21-L24 vec.crit will be a unit vector https://github.com/r-spatial/spdep/blob/b815397c2ce0d58b3b4ca0c36bba7b5c3abbbcc2/R/skater.R#L25-L29 if crit is a scalar or two element vector and crit is to be understood as a minimum or (min, max) count of observations in each group. If crit is scalar and vec.crit is not a unit vector but observation population counts, crit is the minimum group population.

https://github.com/r-spatial/spdep/blob/b815397c2ce0d58b3b4ca0c36bba7b5c3abbbcc2/R/skater.R#L65-L66 shows the counting by group of vec.crit in the iterations, if unit vector then counts of observations, or population counts. Using crit with a length > 1 when vec.crit is not the unit vector perhaps should error exit (I don't see any errors, only unhelpful output objects.

From the examples, it seems as though repeated application of skater to an object returned by the function may permit the updating of the object, but I fail to see what a vector of group population counts would mean in practice. Possibly we should add short comments in the documentation and perhaps a warning in the function code if length(crit) > 1L and vect.crit is not the unit vector.

rsbivand avatar Jan 13 '24 10:01 rsbivand

You can combine a discrete variable with a few levels with the groups from the clustering output.

eliaskrainski avatar Jan 15 '24 11:01 eliaskrainski

library(spdep)

## a map
map <- sf::st_read(system.file(
               "shapes/sids.shp", package="spData")[1])

### neighboorhod list
nbl <- poly2nb(map)

### a variable: estimate of the rate
ni <- map$BIR74 + map$BIR79
yi <- map$SID74 + map$SID79
map$raw <- yi/ni
map$est <- EBlocal(
    ri = yi,
    ni = ni,
    nb = nbl)$est
summary(map$est)

## standardize
dpad <- scale(map$est)

### calculating costs
lcosts <- nbcosts(nbl, dpad)

### making listw
nbw <- nb2listw(nbl, lcosts, style="B")

### find a minimum spanning tree
mst <- mstree(nbw, 5)

### cluster
sk <- skater(
    edges = mst,
    data = dpad,
    ncuts = 3
)
table(sk$groups)
tapply(ni, sk$groups, sum)

### subdivide the groups by ni
summary(ni)
table(b3 <- findInterval(ni, c(3000, 7000)))

### combine the results
newc <- paste0("B", b3, "g", sk$groups)
table(newc, b3)

### add results to the map to plot
map$group1 <- paste0("g", sk$groups)
map$group2 <- newc

library(ggplot2)
library(ggpubr)

gg0 <- ggplot(data = map) + theme_minimal()

ggarrange(gg0 + geom_sf(aes(fill = raw)),
          gg0 + geom_sf(aes(fill = est)),
          gg0 + geom_sf(aes(fill = group1)),
          gg0 + geom_sf(aes(fill = group2)))

eliaskrainski avatar Jan 15 '24 11:01 eliaskrainski

Thanks, @rsbivand & @eliaskrainski! I will give it a shot!

ashirwad avatar Jan 17 '24 15:01 ashirwad