dodgr dodgr_flows_si function

Current work on New York city pedestrian flows, via moveability/calibration repo uses Spatial Interaction models input to dodgr_flows_aggregate() via the entirely arbitrary flows argument, which is a

Matrix of flows with nrow(flows)==length(from) and ncol(flows)==length(to).

This is simply too big for spatial interaction models of NYC when origins and destinations include centrality values and/or residential locations. There are > 100,000 of these, and the matrices become too big to be held in memory. However, it is also not necessary to calculate the entire matrices for SI modes, as each set of flows from a given origin only needs the corresponding row of the flow matrix, used to specify strengths of spatial interactions.

The Plan

Write a new dodgr_flows_si() function which accepts a set of from and to points along with same-length vectors of corresponding density values, in lieu of the potentially huge matrices of pair-wise strengths of spatial interaction otherwise required. Create a new RcppParallel::Worker class object that does the calculation of SI strengths for each origin during the actual flow aggregation loop, for a given vector of exponential decay parameters, one for each origin.

Ping @Robinlovelace - this is just a wee :wrench:-in-the-:gear: for current calibration task. Aiming to have this new function up tomorrow (Thurs), and that will enable the final - biggest - layers to be calculated.

Oct 30 '19 14:10 mpadge

Sounds like a plan. Was thinking it may be a plan to split-out the SI components in a sim package but that's for another day and another project. Looking forward to seeing it in action!

Oct 30 '19 17:10 Robinlovelace

Reprex of the new function in action:

library(dodgr)
# load the Accra street network from somewhere else, then ...
net <- weight_streetnet (accra) %>%
    dodgr_contract_graph ()
v <- dodgr_vertices (net)

set.seed (1)
nf <- 100
nt <- 1000
from <- sample (v$id, nf)
to <- sample (v$id, nt)

k <- 500 + 10 * rnorm (nf) # Vector of exponential decay coefficient for each origin
dens_from <- 100 * runif (nf) # Vector of origin densities
dens_to <- 100 * runif (nt) # Vector of destination densities

# the old way, by constructing an explicit spatial interaction matrix and aggregating:
system.time ({
    d <- dodgr_distances (net, from = from, to = to)
    d_from <- array (dens_from, dim = c (nf, nt))
    d_to <- t (array (dens_to, dim = c (nt, nf)))
    kmat <- array (k, dim = c (nf, nt))
    fmat <- d_to * exp (-d / kmat)
    fmat [is.na (fmat)] <- 0
    csmat <- array (rowSums (fmat), dim = c (nf, nt))
    fmat <- d_from * fmat / csmat
    netf <- dodgr_flows_aggregate (net, from = from, to = to, flows = fmat)
})
#>    user  system elapsed 
#>   7.790   0.144   1.851

# the new way
system.time ({
    netf_si <- dodgr_flows_si (net, from = from, to = to, k = k,
                             dens_from = dens_from, dens_to = dens_to)
})
#>    user  system elapsed 
#>   3.283   0.127   0.996
# ---> it's about twice as fast!

identical (netf$flow, netf_si$flow)
#> [1] FALSE
max (abs (netf$flow - netf_si$flow))
#> [1] 10.83691

cor (netf$flow, netf_si$flow)
#> [1] 0.9995376
plot (netf$flow, netf_si$flow)
lines (range (netf$flow), range (netf$flow), col = "red")

^{Created on 2019-10-31 by the reprex package (v0.3.0)}

There are all sorts of rounding errors and whatnot going on here, but the results are virtually the same. Best of all, the "old" way required explicit construction of several nfrom-times-nto matrices, whereas the dodgr_flows_si function uses no large matrices at all. The former could be easily made to fail by submitting a large enough job; the new one should be failsafe. The speed gain is then just an added bonus.

Re-opening until this has been documented in a vignette somewhere.

Oct 31 '19 12:10 mpadge

Impressive. Happy to help out with the docs when I get my head round it. Can see this being very useful but need to understand it first!

Oct 31 '19 14:10 Robinlovelace

I had to write this function, because I just couldn't calculate all the layers for our current work the old way - R kept on violently crashing. It wasn't (directly) the fault of dodgr, rather just that the matrices needed to construct the spatial interaction terms were so huge they ate almost all of my memory. These kept on getting slightly modified, until the memory got eaten and R crashed. This new function is sooo much easier, and will now allow automatic calculation of all required flow layers.

Oct 31 '19 14:10 mpadge

TODO:

[ ] Document this function properly in the main dodgr_flows vignette

Issue can then be closed, as the functionality is fully developed

Oct 10 '21 20:10 mpadge