effectsize icon indicating copy to clipboard operation
effectsize copied to clipboard

Discrepancy in `t_to_d`

Open rvlenth opened this issue 5 years ago • 18 comments

Describe the bug I was doing a comparison between the results of emmeans::eff_size() and effectsize::t_to_d(), and come up with very different results

To Reproduce

require(effectsize)
#> Loading required package: effectsize
require(emmeans)
#> Loading required package: emmeans

warp.lm = lm(breaks ~ tension, data = warpbreaks)
emm = emmeans(warp.lm, "tension")

### Using emmeans::eff_size
eff_size(emm, sigma = sigma(warp.lm), edf = df.residual(warp.lm))

#>  contrast effect.size    SE df lower.CL upper.CL
#>  L - M          0.842 0.344 51    0.152     1.53
#>  L - H          1.239 0.355 51    0.526     1.95
#>  M - H          0.397 0.336 51   -0.276     1.07
#> 
#> sigma used for effect sizes: 11.88 
#> Confidence level used: 0.95

### Using t_to_d
t = summary(pairs(emm)) $ t.ratio
t_to_d(t, df = df.residual(warp.lm))

#>    d |        95% CI
#> --------------------
#> 0.71 | [ 0.14, 1.27]
#> 1.04 | [ 0.45, 1.62]
#> 0.33 | [-0.22, 0.89]

Note that the estimated effect sizes are somewhat smaller than yielded by eff_size().

My own equivalent function

  • Since d = delta / sigma (where delta is the difference of means), and t = delta / (sigma * sqrt (2 / n)), we have that d = sqrt(2 / n) * t.
  • Meanwhile, df = k * (n - 1) where k is the number of groups, so that n = 1 + df / k.
  • Finally, to get k, the number of t ratios is k * (k - 1) / 2, so k = (1 + sqrt(1 + 8 * L)) / 2, where L is the number of t ratios.

Hence, my own version of t_to_d(), implementing these steps backwards:

### My own equivalent (just the point estimates)
rvl_t2d = function(t, df) {
    k = (1 + sqrt(1 + 8 * length(t))) / 2
    n = 1 + df / k
    sqrt(2 / n) * t
}

rvl_t2d(t, df.residual(warp.lm))

#> [1] 0.8417098 1.2391839 0.3974741

These results agree with the emmeans::eff_size() results.

Additional comment

My rvl_t2d() function depends on our having a balanced experiment with independent samples. Anyn such function that depends only on t ratios and df would require the same assumption (or something equally restrictive). The function emmeans::eff_size() does not require such a simplifying assumption, as it actually estimates each pairwise difference and divides it by the estimate of sigma. That's one reason I am quite sure that its results are correct. But I would think that effectsize::t_to_d() should yield the same results when those restrictive assumptions are satisfied, as they are in this illustration. I am not well-versed in the effect-size literature so I cannot comment with any authority on the formulas given in the documentation for t_to_d().

My system

R.Version()
#> $platform
#> [1] "x86_64-w64-mingw32"
#> 
#> $arch
#> [1] "x86_64"
#> 
#> $os
#> [1] "mingw32"
#> 
#> $system
#> [1] "x86_64, mingw32"
#> 
#> $status
#> [1] ""
#> 
#> $major
#> [1] "4"
#> 
#> $minor
#> [1] "0.3"
#> 
#> $year
#> [1] "2020"
#> 
#> $month
#> [1] "10"
#> 
#> $day
#> [1] "10"
#> 
#> $`svn rev`
#> [1] "79318"
#> 
#> $language
#> [1] "R"
#> 
#> $version.string
#> [1] "R version 4.0.3 (2020-10-10)"
#> 
#> $nickname
#> [1] "Bunny-Wunnies Freak Out"

packageVersion("effectsize")
#> [1] '0.4.0'

packageVersion("emmeans")
#> [1] '1.5.2.10003'

Created on 2020-11-28 by the reprex package (v0.3.0)

rvlenth avatar Nov 28 '20 18:11 rvlenth

Looking at the documentation, I worked out that for the case of k = 2 (2 groups), we have

     2 * t        delta          n
    ---------  =  ----- x sqrt(-----)
    sqrt(df)        s          n - 1

which is approximately equal to Cohen's d. So I think the error is in not accounting for the number of groups.

rvlenth avatar Nov 28 '20 18:11 rvlenth

@rvlenth Indeed the form used here is d = 2 * t / sqrt(df), based on e.g. Rosenthal (1994) (should perhaps update the reference in the docs), by which:

image

This formula is used for meta-analyses purposes and is also used by some cases / fields as an approximation of Cohen's d in other situations where it is not straight forward to compute it, such as for LMMs (I should make this more clear in the docs / vignette, along with the assumed equal-sized groups).

I will also add a reference to emmeans::eff_size() and your comparisons vignette in the docs / vignette.

Perhaps I can add a k_adjust argument, for users to supply the number of original groups, to use for the adjustment, which if set would implement your rvl_t2d()? I'm surprised I haven't come across these adjustments - perhaps a silly question, but do you know of any reference to this adjustment method?


Rosenthal, R. (1994) Parametric measures of effect size. In H. Cooper and L.V. Hedges (Eds.). The handbook of research synthesis. New York: Russell Sage Foundation.

mattansb avatar Nov 29 '20 06:11 mattansb

Mattan,

Hmmm. No reference, as what I wrote there is just my derivation from basic principles. But it seems curious because, obviously, the more groups there are, the more df. Meanwhile, Cohen's d was originally conceptualized as the difference in SD units. And I note that the 'cohens_d' function in your package does require only two groups.

If I get a chance, I'll look at the Rosenthal reference. I'd want to make sure it doesn't say that that equation holds only when there are two groups.

Russ

Sent from my iPad

On Nov 29, 2020, at 12:22 AM, Mattan S. Ben-Shachar [email protected] wrote:



@rvlenthhttps://github.com/rvlenth Indeed the form used here is d = 2 * t / sqrt(df), based on e.g. Rosenthal (1994) (should perhaps update the reference in the docs), by which:

[image]https://user-images.githubusercontent.com/35330040/100534471-d12dac00-3217-11eb-9b8f-6ffe000d8a1a.png

This formula is used for meta-analyses purposes and is also used by some cases / fields as an approximation of Cohen's d in other situations where it is not straight forward to compute it, such as for LMMs (I should make this more clear in the docs / vignette, along with the assumed equal-sized groups).

I will also add a reference to emmeans::eff_size() and your comparisons vignettehttps://cran.r-project.org/web/packages/emmeans/vignettes/comparisons.html in the docs / vignette.

Perhaps I can add a k_adjust argument, for users to supply the number of original groups, to use for the adjustment, which if set would implement your rvl_t2d()? I'm surprised I haven't come across these adjustments - perhaps a silly question, but do you know of any reference to this adjustment method?


Rosenthal, R. (1994) Parametric measures of effect size. In H. Cooper and L.V. Hedges (Eds.). The handbook of research synthesis. New York: Russell Sage Foundation.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/easystats/effectsize/issues/212#issuecomment-735350409, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGMJPL54NEO3DQRG6RTWWP3SSHSC3ANCNFSM4UF6OVYA.

rvlenth avatar Nov 29 '20 13:11 rvlenth

If I get a chance, I'll look at the Rosenthal reference. I'd want to make sure it doesn't say that that equation holds only when there are two groups.

It makes sense to me that you are right, even if this isn't explicitly stated, as Cohen's d is usually framed as a difference between two means.

I think I can re-use you're example code without too much compatibility issues.

require(effectsize)
#> Loading required package: effectsize
require(emmeans)
#> Loading required package: emmeans
`t2d_rvl()`
t2d_rvl <- function(t, df_error, paired = FALSE, ci = 0.95, k = 2) {
  .get_ncp_t <- effectsize:::.get_ncp_t
  
  if (paired) {
    res <- data.frame(d = t / sqrt(df_error + 1))
    
    if (is.numeric(ci)) {
      stopifnot(length(ci) == 1, ci < 1, ci > 0)
      res$CI <- ci
      
      ts <- t(mapply(.get_ncp_t,
                     t, df_error, ci))
      
      res$CI_low <- ts[,1] / sqrt(df + 1)
      res$CI_high <- ts[,2] / sqrt(df + 1)
    }
  } else {
    n = 1 + df_error / k
    
    res <- data.frame(d = sqrt(2 / n) * t)
    
    if (is.numeric(ci)) {
      stopifnot(length(ci) == 1, ci < 1, ci > 0)
      res$CI <- ci
      
      ts <- t(mapply(.get_ncp_t,
                     t, df_error, ci))
      
      res$CI_low <- sqrt(2 / n) * ts[,1]
      res$CI_high <- sqrt(2 / n) * ts[,2]
    }
    
    class(res) <- c("effectsize_table", "see_effectsize_table", class(res))
    return(res)
  }
}
warp.lm = lm(breaks ~ tension, data = warpbreaks)
emm = emmeans(warp.lm, "tension")

eff_size(emm, sigma = sigma(warp.lm), edf = df.residual(warp.lm))
#>  contrast effect.size    SE df lower.CL upper.CL
#>  L - M          0.842 0.344 51    0.152     1.53
#>  L - H          1.239 0.355 51    0.526     1.95
#>  M - H          0.397 0.336 51   -0.276     1.07
#> 
#> sigma used for effect sizes: 11.88 
#> Confidence level used: 0.95

s <- summary(pairs(emm))
t2d_rvl(s$t.ratio, s$df, k = 3)
#>    d |        95% CI
#> --------------------
#> 0.84 | [ 0.16, 1.51]
#> 1.24 | [ 0.54, 1.93]
#> 0.40 | [-0.26, 1.05]

Created on 2020-11-29 by the reprex package (v0.3.0)

I do have a question - what about non-pairwise contrasts? Any idea how one might use these t-values to get d like effect sizes?

Example:

contr <- contrast(emm, list(c = c(2,-1,-1)/2))
 
eff_size(contr, sigma = sigma(warp.lm), edf = df.residual(warp.lm), method = "identity")
#>  contrast effect.size    SE df lower.CL upper.CL
#>  c               1.04 0.307 51    0.425     1.66
#> 
#> sigma used for effect sizes: 11.88 
#> Confidence level used: 0.95 
 
s <- summary(contr)
 
# This?
t2d_rvl(s$t.ratio, s$df, k = 2)
#>    d |       95% CI
#> -------------------
#> 0.99 | [0.41, 1.56]
 
# Or this?
t2d_rvl(s$t.ratio, s$df, k = 3)
#>    d |       95% CI
#> -------------------
#> 1.20 | [0.50, 1.89]

mattansb avatar Nov 29 '20 15:11 mattansb

It does make sense to consider a contrast other than a pairwise one, with the interpretation that d is that contrast expressed in SD units. However, I don't think either of your answers is correct, because the formulas are based on the SE for a pairwise comparison. The "right answer" is:

> eff_size(contr, sigma = sigma(warp.lm), edf = 51, method = "identity")
 contrast effect.size    SE df lower.CL upper.CL
 c               1.04 0.307 51    0.425     1.66

sigma used for effect sizes: 11.88 
Confidence level used: 0.95 

rvlenth avatar Nov 30 '20 18:11 rvlenth

For the time being, I have updated the documentation and vignette. Will get back to this at some point in the near future - thanks Russ!

mattansb avatar Dec 01 '20 08:12 mattansb

@bwiernik might have something to contribute here (thoughts / ideas / solutions)?

mattansb avatar Jun 20 '21 13:06 mattansb

Can you give me a tl;dr on what the issue is?

bwiernik avatar Jun 20 '21 14:06 bwiernik

The t_to_d() conversion (for a conditional d) seems appropriate only for 2 level factors (with equal sample sizes, but that's kind of a given I think).

Then Russell and I had a back and forth discussing if the function should be expanded to a 3+ level factor, and how (to also account for non-pairwise contrasts, etc...).

mattansb avatar Jun 21 '21 05:06 mattansb

Ah, here are methods for d values for arbitrary contrasts https://psycnet.apa.org/journals/met/13/2/99/ and https://journals.sagepub.com/doi/abs/10.1111/1467-9280.00287

And for contrasts of medians https://psycnet.apa.org/doiLanding?doi=10.1037/1082-989X.7.3.370 and https://onlinelibrary.wiley.com/doi/abs/10.1111/bmsp.12171

Reasonable approaches here would be to (1) compute all pairwise contrast d values, (2) compute d contrasts versus the reference level of a factor, (3) compute d contrasts versus the grand mean, or (4) compute d contrasts for a user-specified contrast or correlation for ordered factors

bwiernik avatar Jun 21 '21 13:06 bwiernik

Of those options, I suggest (2) by default for non-ordered factors and (4) correlations for ordered factors. We can add the option for a user specified contrast matrix ala emmeans. We could also rely on the contrasts set for the variable, but most users don't work with that much.

bwiernik avatar Jun 21 '21 13:06 bwiernik

If you give somebody an effect size, then you are giving them something that is in SD units. That means it is vitally important to understand exactly what those units are -- especially the person who constructed the effect size and reports it. From that perspective, a function like t_to_d() is something of a disservice -- even if it produces the correct result -- because it bypasses the user's need to even think about those SD units.

If you have two independent samples of equal size and you think that a pooled t is appropriate, then t_to_d() gives you the value of d. But note all the qualifiers: two, independent, equal, pooled, t. If the samples are not independent, or if they are of unequal size, you are already out of luck. If the pooled t is not appropriate, it means you don't have equal population SDs, so now you are not only out of luck, but the whole concept of standardized effect size is ambiguous, if not completely out the window. And t is a qualifier because technically it requires normality, albeit there is some robustness there when the sample sizes are equal.

If you have more than two samples, then you can correct for that, but there are still all the other issues I just mentioned.

Anyone who really knows me knows I am not a strong proponent of standardized effect sizes, and I even have publications advising strongly against using them as target values in sample-size calculations. Accordingly, the function emmeans::eff_size() is pretty rudimentary. I uses estimates of pairwise differences from a fitted model; but it provides no default for the sigma argument or its degrees of freedom. This makes it clumsy to use from the perspective of a user who wants a quick answer. But from the above perspective, I'm pretty proud of that clunkiness because it forces the user to think about what SD units should be used and where it came from.

Looking back at this thread, I wish I had said this earlier, instead of just addressing computational and technical concerns.

rvlenth avatar Jun 21 '21 19:06 rvlenth

I'm also not a big fan of standardized effect sizes.

That said, if a function exists that provides them, I think the possible choices users have for standardized them and their ramifications should be made clear. From that perspective, I'm not sure the behavior of emmeans::eff_size() is the best. Because it doesn't have a default value for sigma for example, many users will tend to look to the examples for where to get the sigma value. The first example shows using the sigma() function on the fitted model. That's a reasonable value for an lm with only the one group factor, but probably isn't reasonable if there are covariates in the model (cf. the recent discussion here https://github.com/easystats/effectsize/issues/351). My preference is for providing clear sets of choices to users and then doing the appropriate sets of calculations based on those choices.

bwiernik avatar Jun 21 '21 22:06 bwiernik

@bwiernik -- Easy to say, but seldom easy to do. The documentation for eff_size() says (among other things):

For models having a single random effect, such as those fitted using lm; in that case, the stats::sigma and stats::df.residual functions may be useful for specifying sigma and edf.

This pretty much describes the only situation where I could possibly present a usable choice, based on information that can be extracted from a model object.

It goes on to say:

For models with more than one random effect, sigma may be based on some combination of the random-effect variances.

Do you recommend that I take out the first example where I use the sigma() function because a user may not have read the documentation? What about the subsequent examples, where I do show something different? If I do somehow provide clear choices and an explanation of the ramifications of each choice, why would I think they'd read that, given that they didn't read the documentation?

What eff_size() does do that most other software does not is to take into account the error in estimating the SD when it constructs CIs for effect sizes. People should look at those intervals. Often, they'll find out they really have no idea what the actual effect size is.

rvlenth avatar Jun 22 '21 03:06 rvlenth

Originally, I came across the t-to-d conversion in a paper utilizing HLM, where it was used a shortcut to obtain <some type of signal to noise ratio>. I agree that raw effect sizes are better, or at the very least standardizing by the correct / most relevant SD, however given that (1) many people are not proficiently trained in or experienced in such practices (something I do emphasis in my own teachings), (2) readers (and often journals) expect these standardized effect sizes, we're just trying to strike the best balance here..

I think making the little t_to_d() function have a million options is not the easystats way to go (for that there is @rvlenth's S tier emmeans, which we do reference in the docs as the best option, or other meta-analysis packages).

So unless there is an easy "fix" to make the t_to_d() function less wrong or biased or misused (to some reasonable degree, I don't expect to have 100% coverage), I suggest better documentation instead - either in the function docs, or in the appropriate vignette.

mattansb avatar Jun 22 '21 05:06 mattansb

🚨 ALERT 🚨

[This is an automated message]

Someone has captured Mattan and is impersonating him. Fortunately, our defense algorithms managed to uncover the fraud. Probable reason yielded: "never would the real Mattan advocate for better documentation and easy usage over user's mandatory betterment"

DominiqueMakowski avatar Jun 22 '21 05:06 DominiqueMakowski

🤣🤣🤣🤣 The better documentation would lead to the user doing better!

mattansb avatar Jun 22 '21 06:06 mattansb

I definitely agree that standardized effects rarely make any in mixed effects models, no arguments from me there. My concern is mostly about single-level models.

For single-level models, sigma(model) is only really a reasonable choice if the model includes only a single categorical predictor with no covariates. Otherwise, the scale of the effect size will depend on the covariates included in the model.

Within the domain of single level generalized linear models, the meta-analysis literature has explored these issues pretty extensively and given guidelines that conform to what people tend to expect a "Cohen's d" value means. (e.g., See the formulas on page 229 and 230 here: https://books.google.be/books?hl=en&lr=&id=LUGd6B9eyc4C&oi=fnd&pg=PA229#v=onepage&q&f=false)

Currently, t_to_d() is only correct for a simple two-group t-test with even groups. If the groups vary in size, there are more groups in the model, or there are covariates, both the d value estimate and its confidence interval will be wrong.

The way to make t_to_d() fit expectations would be to use the approach here: https://github.com/easystats/effectsize/issues/351

That is, arguments should be added for n1, n2, R^2^ for the covariates, and q (number of covariates) and the appropriate conversion made based on those elements.

bwiernik avatar Jun 23 '21 14:06 bwiernik