nflverse-data icon indicating copy to clipboard operation
nflverse-data copied to clipboard

something odd with WPA

Open ak47twq opened this issue 5 years ago • 4 comments

I compared the diff of two plays' home_wp_post and WPA in the database. Is WPA suppose to be the diff of two plays' home_wp_post? Most numbers check out, but some numbers dont make sense.

Why timeOUT has a different home_wp_post?

Here is what i do:

tic()
test<-pbp %>%
         filter(game_id == "2009_18_GB_ARI",!is.na(home_wp_post)) %>%
         select(game_id,play_id, qtr, desc, total, spread_line, home_wp_post, wpa) %>%
         collect()
toc()

tic()
test <- test %>%
     mutate(wp_diff1 = abs(wpa))
toc()

tic()
test[1,'wp_diff2'] = 0

rownum <- nrow(test)

for (i in 2:rownum){
test[i,'wp_diff2']=abs(test[i,'home_wp_post']-test[i-1,'home_wp_post'])
}
toc()

temp<-test%>%filter(wp_diff2!=wp_diff1)

WPA1 WPA2

ak47twq avatar Oct 26 '20 15:10 ak47twq

Here is some more efficient code to reproduce this

pbp %>%
  filter(game_id == "2009_18_GB_ARI", !is.na(home_wp_post)) %>%
  select(game_id, play_id, play_type, desc, home_team, posteam, wp, home_wp, wpa, home_wp_post) %>%
  mutate(
    wp_diff1 = abs(wpa),
    wp_diff2 = abs(home_wp_post - lag(home_wp_post))
  ) %>%
  filter(wp_diff2 != wp_diff1)

output

# A tibble: 4 x 12
  game_id   play_id play_type desc                                home_team posteam    wp home_wp      wpa home_wp_post wp_diff1 wp_diff2
  <chr>       <dbl> <chr>     <chr>                               <chr>     <chr>   <dbl>   <dbl>    <dbl>        <dbl>    <dbl>    <dbl>
1 2009_18_~    1416 no_play   (7:42) J.Kuhn right tackle to ARI ~ ARI       GB      0.153   0.847  0.00151        0.847  0.00151  0      
2 2009_18_~    1437 run       (7:02) A.Rodgers up the middle for~ ARI       GB      0.155   0.845 -0.00730        0.852  0.00730  0.00580
3 2009_18_~    4108 no_play   Timeout nflverse/nflverse-pbp#1 by ARI at 01:46.         ARI       GB      0.639   0.361  0              0.361  0        0.278  
4 2009_18_~    4125 pass      (1:46) (Shotgun) K.Warner pass sho~ ARI       ARI     0.639   0.639  0.0216         0.661  0.0216   0.300  

home_wp_post of the play 1416 is modified in this line https://github.com/mrcaseb/nflfastR/blob/9ae4bb1951a5b4302bc0e3e83261f5bb4406af32/R/helper_add_ep_wp.R#L1011 where home_wp_post is set to the previous value if the current play and the previous play are "no_play"s

The 4108 play appears to have switched home_wp and away_wp.

Any insights @guga31bb ?

mrcaseb avatar Oct 28 '20 14:10 mrcaseb

This is the equivalent part in nflscrapR and I guess we must have modified it at some point, though I can't remember why. I personally have never used home_wp_post or WPA so I'm surprised we bothered to modify nflscrapR here- there must have been some bug addressed at some point?

guga31bb avatar Oct 28 '20 17:10 guga31bb

finally found the commit but it's not really informative lol https://github.com/mrcaseb/fastscraper/commit/12a03f956b313bcf6b247159474100aa93ae7403#diff-0a766e08dadf2046e3cf5c64e0d680e7315073ec7f764627a0d55f68e13136c0

It's line 766-769 in that commit

mrcaseb avatar Oct 28 '20 19:10 mrcaseb

That commit was mostly me just copy and pasting nflscrapR's part. But it's weird because it doesn't look identical to nflscrapR in that section

guga31bb avatar Oct 28 '20 19:10 guga31bb