Accounting for "ties" in rank-biserial corr in paired/one-sample cases
For paired samples (or one-sample), the rank-biserial correlation gives a (seemingly) odd result when there are ties with the mu argument. For example (code below), if I use the sleep data set there will be 1 tie between the paired comparisons therefore, in my opinion, the effect size, or probability of superiority, should not be -1.00 or 0%. Is there any particular reason for this?
> # example of function
> rank_biserial(extra ~ group, data = sleep, paired = TRUE,
+ parametric = FALSE, mu = 0)
r (rank biserial) | 95% CI
----------------------------------
-1.00 | [-1.00, -1.00]
> # take paired differences
> z = subset(sleep, group == 1)$extra - subset(sleep, group == 2)$extra
> # calculate sum less than zero
> # not equal to 100% less!
> sum(z < 0) / length(z)
[1] 0.9
My feeling is that with .r_rbs should actually be the following for paired samples.
z = na.omit((x - y) - mu)
Ry <- effectsize:::.safe_ranktransform(z, sign = TRUE, verbose = verbose)
Ry0 <-ifelse(is.na(Ry),1,0)
Ry <- stats::na.omit(Ry)
n <- length(na.omit((x - y) - mu))
S <- (n * (n + 1) / 2)
U1 <- sum(Ry[Ry > 0], na.rm = TRUE) + 0.5*sum(Ry0[Ry0 == 1], na.rm = TRUE)
U2 <- -sum(Ry[Ry < 0], na.rm = TRUE) + -0.5*sum(Ry0[Ry0 == 1], na.rm = TRUE)
u_ <- U1 / S
f_ <- U2 / S
u_ - f_
For this exact example, it provides the correct result (again, just my opinion).
I am unfamiliar with this correction - where is this from?
I have taken the idea from O'Brien and Castelloe for their odds calculation. To be honest, I am not sure this has ever been addressed in the literature since sans the Kerby (2014) citation I rarely see the paired samples case mentioned.
My rationale for the adjustment is two-fold.
- The documentation for the package for
rank_biserialstates: "When tied values occur, they are each given the average of the ranks that would have been given had no ties occurred. This results in an effect size of reduced magnitude." - This adjustment is explicitly applied in two-sample case.
- The effect size rank-biserial inherently will mismatch the result from a Wilcoxon signed-rank test with many "ties". For example, I have created a fake dataset of 11 observations where 2 are greater than zero but with 10 "ties", and provided some output to review below.
Notice how the divergent the interpretation would be from the wilcox.test output would be from the rank_biserial output. The p-values for my adjusted version are not exactly matching the results of wilcox.test but they are rather close!
> z = c(rep(0,9),1,1)
> alpha = .05
> # Run through my "ties" calculations
> Ry <- effectsize:::.safe_ranktransform(z, sign = TRUE, verbose = verbose)
> Ry0 <-ifelse(is.na(Ry),1,0)
> Ry <- stats::na.omit(Ry)
>
> n <- length(z)
> S <- (n * (n + 1) / 2)
>
> U1 <- sum(Ry[Ry > 0], na.rm = TRUE) + 0.5*sum(Ry0[Ry0 == 1], na.rm = TRUE)
> U2 <- -sum(Ry[Ry < 0], na.rm = TRUE) + -0.5*sum(Ry0[Ry0 == 1], na.rm = TRUE)
>
> u_ <- U1 / S
> f_ <- U2 / S
> rho = u_ - f_
> rf = atanh(rho)
> maxw <- (n^2 + n) / 2
> rfSE <- sqrt((2 * n^3 + 3 * n^2 + n) / 6) / maxw
>
> # No correction
> # I am using the unadjusted Fisher r-to-z approach
> wilcox.test(x=z,exact = FALSE, correct = FALSE)$p.value
[1] 0.1572992
> 2*pnorm(-abs(rf), sd = 1/(n-3))
[1] 0.1413184
> # Correction
> # Using effectsize's adjusted approach
> wilcox.test(x=z,exact = FALSE, correct = TRUE)$p.value
[1] 0.3457786
> 2*pnorm(-abs(rf), sd = rfSE)
[1] 0.5895675
>
> # Compare to current functionality which implies 100% superiority!
> effectsize::rank_biserial(x=z)
r (rank biserial) | 95% CI
--------------------------------
1.00 | [1.00, 1.00]
> effectsize::effectsize(wilcox.test(x=z,exact = FALSE, correct = FALSE))
r (rank biserial) | 95% CI
--------------------------------
1.00 | [1.00, 1.00]
Again, just a thought (and an opinion), but it just seems wrong to me that 10 ties and 2 positive values would give r_{rb} = 1
it just seems wrong to me that 10 ties and 2 positive values would give r_{rb} = 1
Yes, I definitely agree with this. However, I am not familiar enough with rank based measures. Tagging @strengejacke and @bwiernik (who I believed implemented the current CI methods).
From what I can see in stats:::wilcox.test.default, 0s are dropped for the calculation of the statistic, but n includes the 0s (I don't see any other indication the anything is done with 0s).
Also, @arcaldwell49 something is off with your code I think?
aarons_rbs <- function(z) {
# Run through my "ties" calculations
Ry <- effectsize:::.safe_ranktransform(z, sign = TRUE, verbose = verbose)
Ry0 <-ifelse(is.na(Ry),1,0)
Ry <- stats::na.omit(Ry)
n <- length(z)
S <- (n * (n + 1) / 2)
U1 <- sum(Ry[Ry > 0], na.rm = TRUE) + 0.5*sum(Ry0[Ry0 == 1], na.rm = TRUE)
U2 <- -sum(Ry[Ry < 0], na.rm = TRUE) + -0.5*sum(Ry0[Ry0 == 1], na.rm = TRUE)
u_ <- U1 / S
f_ <- U2 / S
u_ - f_
}
z <- c(-1, rep(0,9),1,1)
aarons_rbs(z)
#> [1] 0.1410256
aarons_rbs(-z) # should be the same, but negative
#> [1] 0.08974359
Created on 2022-09-29 by the reprex package (v2.0.1)
Yes, I am working on it. There may be a reason I haven't been able to find something in the literature on this...
However, the "non-exact" form of the Wilcoxon signed rank test uses a normal approximation and therefore uses a z-statistic. This at least gives a wide confidence interval compared to the current implementation.
#functions
aarons_rbs2 <- function(z) {
ZEROES <- any(z == 0)
if(ZEROES){
x2 <- z[z != 0]
}
n <- as.double(length(x2))
r <- rank(abs(x2))
STATISTIC <- sum(r[x2 > 0])
ZSTAT = STATISTIC - n * (n + 1)/4
return(tanh(ZSTAT))
}
get_ties = function(z){
ZEROES <- any(z == 0)
if(ZEROES){
x2 <- z[z != 0]
}
n <- as.double(length(x2))
r <- rank(abs(x2))
STATISTIC <- sum(r[x2 > 0])
NTIES <- table(r)
return(NTIES)
}
get_n = function(z){
ZEROES <- any(z == 0)
if(ZEROES){
x2 <- z[z != 0]
}
n <- as.double(length(x2))
return(n)
}
z <- c(-1,rep(0,9),1,2)
# Wilcoxon test output
wilcox.test(z, exact = FALSE, correct = F)$p.value
#> [1] 0.4142162
zstat = atanh(aarons_rbs2(z))
NTIES = get_ties(z)
n = get_n(z)
SIGMA <- sqrt(n * (n + 1) * (2 * n + 1) / 24
- sum(NTIES^3 - NTIES) / 48)
#CORRECTION = sign(zstat) * 0.5
#ZADJ = ZSTAT/SIGMA
# output
2*pnorm(-abs(zstat), sd = SIGMA)
#> [1] 0.4142162
aarons_rbs2(z)
#> [1] 0.9051483
aarons_rbs2(-z) # should be the same, but negative
#> [1] -0.9051483
# CI
interval <- (zstat) + c(-1, 1) * qnorm(1 - .05 / 2) * SIGMA
# rb ------
tanh(interval)
#> [1] -0.9704917 0.9999258
## Odds -----
stats::qlogis(((tanh(interval)+ 1) / 2))
#> [1] -4.201368 10.201368
Okay, really going down the rabbit hole now. Stata documentation is very helpful https://www.stata.com/manuals/rsignrank.pdf
I will dig into it a bit further.
I can't remember having to do anything with this, but I can take a closer look
Okay, two possible solutions from Sal Mangiafico's rcompanion package http://rcompanion.org/handbook/F_02.html
I am not aware of any established effect size statistic for the one-sample Wilcoxon signed-rank test. However, a version of the rank biserial correlation coefficient (rc) can be used. This is my recommendation. It is included for the matched pairs case in King, Rosopa, and Minimum (2000). As an alternative, using a statistic analogous to the r used in the Mann–Whitney test may make sense.
library(rcompanion)
z = c(rep(0,9),1,1,1,1)
# Account for ties by not dropping
wilcoxonOneSampleRC(x=z, mu =0, verbose=TRUE, zero.method="none")
#>
#> zero.method: none
#> n kept = 13
#> Ranks plus = 0
#> Ranks minus = 46
#> T value = 0
#> rc
#> 0.505
# Account for ties by using Z
wilcoxonOneSampleR(x=z, mu =0, verbose=TRUE, zero.method="none")
#> r
#> 0.555
# Compared to effectsize
effectsize::rank_biserial(z)
#> r (rank biserial) | 95% CI
#> --------------------------------
#> 1.00 | [1.00, 1.00]
So the adaptation is quite easy for effectsize is quite easy for the zero.method="none". This way it will account for ties but won't effect results where there are no ties.
sal_rbs = function(x, mu = 0){
# adapted from rcompanion
z <- x - mu
abs_z = abs(z)
RR = -1 * rank(abs_z) * sign(z)
Rplus = sum(RR[RR > 0])
Rminus = sum(abs(RR[RR < 0]))
Tee = min(Rplus, Rminus)
n = length(RR)
if (Rplus >= Rminus) {
rho = -4 * abs((Tee - (Rplus + Rminus)/2)/n/(n + 1))
}
if (Rplus < Rminus) {
rho = 4 * abs((Tee - (Rplus + Rminus)/2)/n/(n + 1))
}
return(rho)
}
z = c(rep(0,9),1,1,1,1)
sal_rbs(z)
#> [1] 0.5054945
sal_rbs(-1*z)
#> [1] -0.5054945
set.seed(1678)
z2 = rnorm(30,mean=.5)
sal_rbs(z2)
#> [1] 0.6258065
sal_rbs(-1*z2)
#> [1] -0.6258065
library(testthat)
library(effectsize)
test_that("expect match with effectsize",{
expect_equal(sal_rbs(z2),rank_biserial(z2)$r_rank_biserial)
expect_equal(sal_rbs(-1*z2),rank_biserial(-1*z2)$r_rank_biserial)
})
#> Test passed 🥇
Thoughts?
This seems reasonable, but I do worry that this will make the effect size incompatible with the results from wilcox.test().
@bwiernik? Would this affect the current implementation of CIs?