geocompr icon indicating copy to clipboard operation
geocompr copied to clipboard

Some ideas for exercices on ch04 on spatial predicates

Open defuneste opened this issue 4 years ago • 12 comments

Spatial predicates are hard and I think adding them in exercices can help. This is some "raw" ideas that can be explored. I need to find a data set to display "st_crosses".

## Load package and pick some data 
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(spData)

# One US state
Colorado = us_states[us_states$NAME == "Colorado",]

plot(us_states$geometry)
plot(Colorado$geometry, col = 2, add = TRUE)

# What are the neighbouring states of Colorado? 
sgbd1 = st_touches(
    Colorado, 
    us_states)
us_states$NAME[unlist(sgbd1)]
#> [1] "Arizona"    "Kansas"     "Oklahoma"   "Nebraska"   "New Mexico"
#> [6] "Utah"       "Wyoming"

# how many states do we get and why? 
us_states$NAME[unlist(st_intersects( Colorado, 
                                us_states))]
#> [1] "Arizona"    "Colorado"   "Kansas"     "Oklahoma"   "Nebraska"  
#> [6] "New Mexico" "Utah"       "Wyoming"

# to be sure !
us_states$NAME[unlist(st_equals( Colorado, 
                                us_states))]
#> [1] "Colorado"

# Try a pattern:
small_neighbour = "****0****"

us_states$NAME[unlist(st_relate(Colorado, 
                           us_states,
                           pattern = small_neighbour))]
#> although coordinates are longitude/latitude, st_relate_pattern assumes that they are planar
#> [1] "Arizona"

# in the pattern change 0 for 1, what happens? why?   
small_neighbour = "****1****"

us_states$NAME[unlist(st_relate(Colorado, 
                           us_states,
                           pattern = small_neighbour))]
#> although coordinates are longitude/latitude, st_relate_pattern assumes that they are planar
#> [1] "Colorado"   "Kansas"     "Oklahoma"   "Nebraska"   "New Mexico"
#> [6] "Utah"       "Wyoming"

# Bonus why do you have a warning with st_relate and not with the other spatiale predicates ? 

Created on 2022-03-06 by the reprex package (v2.0.1)

defuneste avatar Mar 06 '22 22:03 defuneste

This sounds really good Olivier. One idea on a dataset to illustrate crosses: a straight line from the centroid of one state to the centroid of another (e.g. Washington DC to Texas?) to find out which states it crosses?

Robinlovelace avatar Mar 07 '22 07:03 Robinlovelace

Good idea! I will try something like that later!

defuneste avatar Mar 07 '22 07:03 defuneste

Heads-up @defuneste I'm looking to implement this in this PR: https://github.com/Robinlovelace/geocompr/pull/770

Robinlovelace avatar Mar 18 '22 07:03 Robinlovelace

Great! I am still working on something with st_crosses. I have a bit of trouble casting multilinetsring to proper linestrings (OSM data can be mischievous). I will ty to do that quickly.

defuneste avatar Mar 18 '22 07:03 defuneste

Great. Feel free to build on my commits on that branch...

Robinlovelace avatar Mar 18 '22 08:03 Robinlovelace

There are many ways to subset based on spatial relations, the simplest being x[y, , op = st_*()].

I think we should present that simplest solution first but also demonstrate the others. Lots going on, well worthy of an exercise.

library(sf)
library(spData)
colorado = us_states[us_states$NAME == "Colorado",]

plot(us_states$geometry)
plot(colorado$geometry, col = 2, add = TRUE)

# Way 1:
touches_colorado1 = us_states[colorado, , op = st_touches]
plot(touches_colorado1)

# Way 2:
sgbd1 = st_touches(
  colorado, 
  us_states
  )
sgbd1
us_states$NAME[unlist(sgbd1)]
touches_colorado2 = us_states[unlist(sgbd1), ]
plot(touches_colorado2)

# Way 3
sel_intersects_colorado = st_touches(us_states, colorado)
sel_intersects_colorado_list = lengths(sel_intersects_colorado) > 0
touches_colorado3 = us_states[sel_intersects_colorado_list, ]
plot(touches_colorado3)
library(sf)
#> Linking to GEOS 3.10.1, GDAL 3.4.0, PROJ 8.2.0; sf_use_s2() is TRUE
library(spData)
colorado = us_states[us_states$NAME == "Colorado",]

plot(us_states$geometry)
plot(colorado$geometry, col = 2, add = TRUE)

# Way 1:
touches_colorado1 = us_states[colorado, , op = st_touches]
plot(touches_colorado1)

# Way 2:
sgbd1 = st_touches(
  colorado, 
  us_states
  )
sgbd1
#> Sparse geometry binary predicate list of length 1, where the predicate
#> was `touches'
#>  1: 2, 9, 19, 37, 39, 45, 49
us_states$NAME[unlist(sgbd1)]
#> [1] "Arizona"    "Kansas"     "Oklahoma"   "Nebraska"   "New Mexico"
#> [6] "Utah"       "Wyoming"
touches_colorado2 = us_states[unlist(sgbd1), ]
plot(touches_colorado2)

# Way 3
sel_intersects_colorado = st_touches(us_states, colorado)
sel_intersects_colorado_list = lengths(sel_intersects_colorado) > 0
touches_colorado3 = us_states[sel_intersects_colorado_list, ]
plot(touches_colorado3)

Created on 2022-03-18 by the reprex package (v2.0.1)

Robinlovelace avatar Mar 18 '22 10:03 Robinlovelace

Result! I think this issue can be closed with the commit above...

image

Robinlovelace avatar Mar 18 '22 10:03 Robinlovelace

I picked Colorado because it touches an other state with just one point (Arizona). So learner can explore various way (and I really like that you used other ways !!).

I wanted to do the same with st_crosses vs st_intersect. But I think we can close this issue and that we provided already good enough exercises. What is the "etiquette" you are closing it?

For the sake of it (because I spend some time on it :wink: ) this was were I was:

library(osmdata)
library(dplyr)
library(sf)
library(spData)
library(lwgeom)

id_track = 8440320 # used overpass to get that relation ID 

track = opq_osm_id (type = "relation", id = id_track) %>%
    opq_string () %>%
    osmdata_sf () 

# saveRDS(track, "chicago_SF") #avoiding a call in OSM

# I want some train, can be improved with some pipes  
slow_train_com = track$osm_lines$geometry
slow_train_com = st_combine(slow_train_com) 
# I want just one line
slow_train_com = st_line_merge(slow_train_com)
slow_train_com = st_transform(slow_train_com, st_crs(us_states))

# I want some train stations ! 
train_station = track$osm_points[!is.na(track$osm_points$name),c("name", "network")]
# remove train station that only Metra uses
train_station = train_station[grep(pattern = "Amtrak", train_station$network),]
train_station  = st_transform(train_station, st_crs((us_states)))

# testing st_split to produce sgment between train station
split_train = lwgeom::st_split(slow_train_com, train_station)
part_of_track = st_collection_extract(split_train[[1]], "LINESTRING")
part_of_track  = st_set_crs(part_of_track, st_crs(us_states))
# sf
random_value = data.frame(rv = 1:length(part_of_track))
part_of_track = st_sf(part_of_track, random_value) 

# st_crosses vs st_intersect 
# st_crosses should select which parts of the track (defined between two train stations) are crossing two states 
st_crosses(part_of_track, us_states)

st_intersects(part_of_track, us_states)

defuneste avatar Mar 18 '22 11:03 defuneste

Reopened as some unfinished business here, as discussed with @defuneste:

  • [ ] Other types of relation?
  • [ ] A question that demonstrates use of a DE-9IM string

Both of those things could potentially be done in a single exercise or additional bullet point in the exercises. The Amtrack example is awesome but, to avoid complexity of people getting the same dataset think it would need to be hosted somewhere, good candidate for spDataLarge? Heads-up @Nowosad on that...

Robinlovelace avatar Mar 18 '22 15:03 Robinlovelace

Yep -- spDataLarge should be fine.

Nowosad avatar Mar 21 '22 11:03 Nowosad

Hey @Robinlovelace, any chances you can take a look at this, and close it or resolve it (before the end of the year)?

Nowosad avatar Nov 28 '23 10:11 Nowosad

Hey @Robinlovelace, any chances you can take a look at this, and close it or resolve it (before the end of the year)?

Sure. I see this as a nice-to-have. Not essential for the 2nd edition. I've assigned myself and will aim to fix it by end of the year but if not we can close and/or mark as wontfix.

Robinlovelace avatar Nov 28 '23 13:11 Robinlovelace