Incorrectly Calculated P-Values
Hello Benjamin!
Thank you so much for this wonderful package and your incredible work. I have benefited a lot from your package and appreciate the time and effort you put into it. I've been running into an issue with some of the p-values being calculated not the way I want them to be. I am working on a project and ran the following code:
pvalue <- function(x, ...) {
# Construct vectors of data y, and groups (strata) g
y <- unlist(x)
g <- factor(rep(1:length(x), times=sapply(x, length)))
if (is.numeric(y)) {
# For numeric variables, perform a standard 2-sample t-test
# p <- t.test(y ~ g)$p.value
} else {
# For categorical variables, perform individual chi-squared tests for each category
p <- sapply(levels(y), function(z) chisq.test(table(y==z, g))$p.value)
}
# Format the p-value, using an HTML entity for the less-than sign.
# The initial empty string places the output on the line below the variable label.
c("", sub("<", "<", format.pval(p, digits=3, eps=0.001)))
}
#2014/2015 Data
t <- table1(~ `Command`
| CombinedYear*TextScore, data = s201415, extra.col=list(`P-value`=pvalue)
, digits=5,overall = F,topclass="Rtable1-zebra Rtable1-shade Rtable1-times"
)
The t-test is commented out because all of the data in this project is categorical. I got the following result:
I am looking at evaluation results where we have 97 groups perform certain commands and they were marked as either "Not Done" or "Well Done" for each command. Since I am looking at evaluation results, I have 97 groups who were evaluated so each row adds up to 97 unless the data is missing (if it was missing, it was excluded for that row). When I manually calculate these p-values out, for example the first one "Command3", it should be significant. See example from basic chi-square calculator website online:
Source: https://www.socscistatistics.com/tests/chisquare2/default2.aspx
I have been troubleshooting for some time but I am unsure of what the issue is. I apologize if it is something that should be obvious, as I am not super experienced. I think it might be that instead of using 97 as the total, it is using the totals in the header (which is Not Done N=143 and Well Done N=831). Do you think you would be able to provide some insight as to why the p-values are being calculated this way? If so, how can I fix it?
Thank you so much in advance.
Warmest regards, Vaish
First, you have to ask yourself what are the hypotheses that you are testing (a p-value is always associated to a hypothesis test). You need to formulate this clearly or order for the p-value to have the desired meaning.
Note that in the screenshot you posted from the website, I don't think it is formulated correctly because you have a 2x2 contingency table with a grand total of 194(!). I don't think this represents your situation, but again you need to formulate the hypotheses clearly first.