ChainLadder icon indicating copy to clipboard operation
ChainLadder copied to clipboard

MackChainLadder should return the same result if Triangle is passed through a pipe

Open msenn opened this issue 7 years ago • 7 comments

Problem

The value returned by MackChainLadder() depends on whether Triangle is passed directly (i.e. as a function argument) or using magrittr's pipe operator (%>%):

library(ChainLadder)
library(magrittr)

# Pass Triangle directly
mcl <- MackChainLadder(RAA)

# Pipe Triangle
mcl_piped <- RAA %>% 
  MackChainLadder()

identical(mcl, mcl_piped)         # Returns FALSE

Further information

Differences are in elements "call" and "Model":

idx.diff <- which(vapply(
  seq_along(mcl),
  function(i) !identical(mcl[[i]], mcl_piped[[i]]),
  logical(1))
)

names(mcl)[idx.diff]

Arguably, the only difference is in the original name of the Triangle object. This difference may look minor and cosmetic. However, it will create confusion to anybody trying verify that two pieces of code lead to the same outcome. Also, pipes are so prevalent these days that they shouldn't be ignored.

System info

I am using the current GitHub version of ChainLadder. Here's my sessionInfo():

R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.5 (Maipo)

Matrix products: default
BLAS: /opt/R/3.5.0/lib64/R/lib/libRblas.so
LAPACK: /opt/R/3.5.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] magrittr_1.5      ChainLadder_0.2.6

loaded via a namespace (and not attached):
 [1] biglm_0.9-1       statmod_1.4.30    zoo_1.8-2         tidyselect_0.2.4  purrr_0.2.5      
 [6] reshape2_1.4.3    splines_3.5.0     haven_1.1.1       lattice_0.20-35   carData_3.0-1    
[11] colorspace_1.3-2  stats4_3.5.0      yaml_2.1.19       rlang_0.2.1       pillar_1.2.3     
[16] foreign_0.8-70    glue_1.3.0        tweedie_2.3.2     readxl_1.1.0      bindrcpp_0.2.2   
[21] bindr_0.1.1       plyr_1.8.4        stringr_1.3.1     munsell_0.5.0     cplm_0.7-7       
[26] gtable_0.2.0      cellranger_1.1.0  zip_1.0.0         expint_0.1-4      coda_0.19-1      
[31] systemfit_1.1-22  rio_0.5.10        forcats_0.3.0     lmtest_0.9-36     curl_3.2         
[36] Rcpp_0.12.17      scales_0.5.0      abind_1.4-5       ggplot2_3.0.0     stringi_1.2.3    
[41] openxlsx_4.1.0    dplyr_0.7.6       grid_3.5.0        tools_3.5.0       sandwich_2.4-0   
[46] lazyeval_0.2.1    tibble_1.4.2      car_3.0-0         pkgconfig_2.0.1   MASS_7.3-50      
[51] Matrix_1.2-14     data.table_1.11.4 actuar_2.3-1      assertthat_0.2.0  minqa_1.2.4      
[56] R6_2.2.2          nlme_3.1-137      compiler_3.5.0   

msenn avatar Aug 10 '18 07:08 msenn

The same is true for other functions like lm:

m_piped <- data.frame(x=1:10,  y=1:10) %>% lm
m <- lm(y~x, data=data.frame(x=1:10,  y=1:10))
identical(m , m_piped)
FALSE

How do you deal with those situations?

mages avatar Oct 03 '18 15:10 mages

From what I can tell the only differences are based on the call. I've modified the example to make it more clear. If you look at the differences in the original example they are all about the formula and call.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
m_piped <- data.frame(x=1:10,  y=1:10) %>% lm(formula = y ~ x)
m <- lm(y~x, data=data.frame(x=1:10,  y=1:10))
all.equal(m, m_piped)
#> [1] "Component \"call\": target, current do not match when deparsed"

Created on 2019-01-16 by the reprex package (v0.2.1)

ryanbthomas avatar Jan 16 '19 22:01 ryanbthomas

The two objects were created differently -- with different calls:

m_piped$call lm(formula = y ~ x, data = .) m$call lm(formula = y ~ x, data = data.frame(x = 1:10, y = 1:10))

Is your concern the loss of information regarding the source of 'data' in m_piped? "data = ." is a common idiom in the tidyverse. If that source is important to you -- and I can see why it would be -- then I suggest avoiding piping. Otherwise, I am happy that is the only difference in the two objects. Maybe someone in a tidyverse list can help with the "lm(formula = y ~ x, data = .)" issue. Thank you for your interest in ChainLadder! Dan

On Wed, Jan 16, 2019 at 2:45 PM Ryan Thomas [email protected] wrote:

From what I can tell the only differences are based on the call. I've modified the example to make it more clear. If you look at the differences in the original example they are all about the formula and call.

library(dplyr)#> #> Attaching package: 'dplyr'#> The following objects are masked from 'package:stats':#> #> filter, lag#> The following objects are masked from 'package:base':#> #> intersect, setdiff, setequal, unionm_piped <- data.frame(x=1:10, y=1:10) %>% lm(formula = y ~ x)m <- lm(y~x, data=data.frame(x=1:10, y=1:10)) all.equal(m, m_piped)#> [1] "Component "call": target, current do not match when deparsed"

Created on 2019-01-16 by the reprex package https://reprex.tidyverse.org (v0.2.1)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mages/ChainLadder/issues/57#issuecomment-454973224, or mute the thread https://github.com/notifications/unsubscribe-auth/AGKcB0fqcY5zAop6XYdOS7Imq-jfp2E1ks5vD6uLgaJpZM4V3vBL .

trinostics avatar Jan 17 '19 14:01 trinostics

From what I can tell the only differences are based on the call

That's what I meant by "the only difference is in the original name of the Triangle object". Apologies for being unclear.

As for my concern: This behavior got me when I wrote unit tests for a function that uses MackChainLadder(). The tests would fail when using the pipe but not otherwise. The reason turned out to be the call object.

I have no strong opinion what to do about this. By having 'call' in the return value, MackChainLadder() is in line with lm() as demonstrated by @mages. However, its output varies slightly if passed directly versus piped.

Obviously, there are wars fought over the merits and drawbacks of the pipe and we should probably not repeat this here. Therefore, feel free to close the issue if you conclude that consistency over time and with lm() weights heavier than consistency if piped.

msenn avatar Jan 22 '19 07:01 msenn

I believe this is a feature, not an issue, of the piping paradigm. E.g., if the formula had not been omitted in the toy example, the call of the result would have been different still:

original toy example

m_piped <- data.frame(x=1:10, y=1:10) %>% lm

m_piped$call

lm(formula = .)

toy example including formula

m_piped <- data.frame(x=1:10, y=1:10) %>% lm(y~x, data = .)

m_piped$call

lm(formula = y ~ x, data = .)

toy example including formula and another default argument value

m_piped <- data.frame(x=1:10, y=1:10) %>% lm(x~y, data = ., model = TRUE)

m_piped$call

lm(formula = x ~ y, data = ., model = TRUE)

These example results are supported by the following technical note at the magrittr site (https://magrittr.tidyverse.org/reference/pipe.html):

“For most purposes, one can disregard the subtle aspects of magrittr's evaluation, but some functions may capture their calling environment, and thus using the operators will not be exactly equivalent to the "standard call" without pipe-operators.”

From: msenn [email protected] Sent: Monday, January 21, 2019 11:20 PM To: mages/ChainLadder [email protected] Cc: Dan Murphy [email protected]; Comment [email protected] Subject: Re: [mages/ChainLadder] MackChainLadder should return the same result if Triangle is passed through a pipe (#57)

From what I can tell the only differences are based on the call

That's what I meant by "the only difference is in the original name of the Triangle object". Apologies for being unclear.

As for my concern: This behavior got me when I wrote unit tests for a function that uses MackChainLadder(). The tests would fail when using the pipe but not otherwise. The reason turned out to be the call object.

I have no strong opinion what to do about this. By having 'call' in the return value, MackChainLadder() is in line with lm() as demonstrated by @mages https://github.com/mages . However, its output varies slightly if passed directly versus piped.

Obviously, there are wars fought over the merits and drawbacks of the pipe and we should probably not repeat this here. Therefore, feel free to close the issue if you conclude that consistency over time and with lm() weights heavier than consistency if piped.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mages/ChainLadder/issues/57#issuecomment-456295603 , or mute the thread https://github.com/notifications/unsubscribe-auth/AGKcB0hr92hr85pcshgDmfHOO7-lVxO3ks5vFruogaJpZM4V3vBL . https://github.com/notifications/beacon/AGKcB4R3_HIz5K26d6VaW7DGCg1uArGYks5vFruogaJpZM4V3vBL.gif

trinostics avatar Jan 22 '19 14:01 trinostics

I have two final comments:

  1. I think this topic belongs on a tidyverse or magrittr mailing list. I would encourage the OP to take it there, and please let this thread know so we can follow the discussion there. I say that because ...
  2. When I was working on R code for Mack (and extensions) before I met Markus, I got deep into lm's entrails. In particular, I went to great lengths to construct the function calls so that, in the end, it would be clear what data was being analyzed and what lm levers were being pulled at each step of the development process. Maybe the user's own variable names could be stored for later regurgitation as needed for clarification and communication. Alas, that was overly ambitious at that time. Ten years later, perhaps no longer so, and the magrittr approach may be flexible enough to implement such transparency.

Thanks for raising this issue, and thanks again for your interest in ChainLadder.

chiefmurph avatar Jan 23 '19 06:01 chiefmurph

Leaving this here in case it might be of help to someone else.

I use all.equal() (and testthat::expect_equal()) to check for "identity" of MackChainLadder() output.

identical() expects the environment in which the calls are evaluated to also be exactly equal, so it might not be the best way to check for equality of MCL output.

suppressPackageStartupMessages(library(ChainLadder))

set.seed(1024)
mcl <- MackChainLadder(RAA)

set.seed(1024)
mcl2 <- MackChainLadder(RAA)

identical(mcl, mcl2)
#> [1] FALSE

# TL;DR:
# Difference is in model terms attribute '.Environment'. I suppose that's 
# the environment in which the calls are evaluated in. Nothing to worry about, 
# if you ask me.

# Explanation:

# which elements aren't identical:
for (nm in names(mcl)) {
  if (!identical(mcl[[nm]], mcl2[[nm]])) {
    print(nm)
  }
}
#> [1] "Models"

# The 'Models' are a bunch of calls and coefficients. Let's work with the first
# item in their list:
a <- mcl$Models[[1]]
b <- mcl2$Models[[1]]

# Which elements are different:
for (nm in names(a)) {
  if (!identical(a[[nm]], b[[nm]])) {
    print(nm)
  }
}
#> [1] "terms"
#> [1] "model"

a_terms <- a[['terms']]
b_terms <- b[['terms']]

# check which attributes aren't identical:
for (att in names(attributes(a_terms))) {
  if (!identical(attr(a_terms, which = att), attr(b_terms, which = att))) {
    print(att)
  }
}
#> [1] ".Environment"

# attr '.Environment'

a_model <- a[['model']]
b_model <- b[['model']]

# Again for the models, only the environment attribute is different since the 
# columns are identical:
for (nm in names(a_model)) {
  if (!identical(a_model[[nm]], b_model[[nm]])) {
    print(nm)
  }
}

# checking the attributes:
for (att in names(attributes(a_model))) {
  if (!identical(attr(a_model, which = att), attr(b_model, which = att))) {
    print(att)
  }
}
#> [1] "terms"

# The 'terms' are the same as we had seen before: 
identical(a[['terms']], attr(a_model, which = 'terms'))
#> [1] TRUE

# Meaning only the '.Environment' attribute is different.

Created on 2022-08-29 with reprex v2.0.2

kennedymwavu avatar Aug 29 '22 12:08 kennedymwavu