performance Linear model diagnostic tests potential discrepancies

Hello, I noticed some discrepancies between model diagnostic tests, specifically the test for heteroskedasticity (nonconstant variance). I included the tests for normality and autocorrelation of residuals, for completeness, but it is the heteroskedasticity I am most concerned about due to the magnitude of the difference:

model <- lm(mpg ~ wt + cyl + gear + disp, data = mtcars)
performance::check_heteroskedasticity(model)
#> Warning: Heteroscedasticity (non-constant error variance) detected (p = 0.042).
lmtest::bptest(model) # studentized
#> 
#>  studentized Breusch-Pagan test
#> 
#> data:  model
#> BP = 6.4424, df = 4, p-value = 0.1685
lmtest::bptest(model, studentize = FALSE)
#> 
#>  Breusch-Pagan test
#> 
#> data:  model
#> BP = 7.9496, df = 4, p-value = 0.09344
shapiro.test(model$residuals)
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  model$residuals
#> W = 0.95546, p-value = 0.2056
performance::check_normality(model)
#> OK: residuals appear as normally distributed (p = 0.230).
performance::check_autocorrelation(model)
#> OK: Residuals appear to be independent and not autocorrelated (p = 0.262).
lmtest::dwtest(model, alternative = "two.sided")
#> 
#>  Durbin-Watson test
#> 
#> data:  model
#> DW = 1.7786, p-value = 0.2846
#> alternative hypothesis: true autocorrelation is not 0

Created on 2025-02-19 with reprex v2.1.1

Feb 19 '25 19:02 krisawilson

We can look into it, but personally I would recommend against any of these tests and instead use a graphical check provided by check_model()

Feb 27 '25 23:02 bwiernik

This is a great package and I'm really enjoying using it. I'm curious, though, why do you suggest not using the tests if they are provided? From my user perspective, I wanted to be able to use the check_model() function for it's convenience, but I also need to check and compare a lot of models - having a cutoff (like a p-value) makes it much easier to do so. In my case (and possibly others), running the individual functions (as appropriate) to capture the plot and output is preferable over check_model() function which only offers the visualizations. So, I'm curious about the reasoning behind your suggestion.

Jun 19 '25 19:06 jmodlis

The real question is not "Are the assumptions perfectly met?" (the answer is always no), but rather "Are the violations of these assumptions severe enough to compromise the conclusions I want to draw from this model?". Knowing the nature of the violation is essential for fixing it. A plot guides you toward a solution (e.g., logging a variable, adding a polynomial term, or using a different model family), whereas a p-value leaves you in the dark.

The recommendation to prioritize check_model() is based on the principle that visual inspection provides a richer, more robust, and less arbitrary assessment of model adequacy than formal hypothesis tests.

Jun 19 '25 20:06 strengejacke

For a great discussion of this perspective, check out Applied Linear Regression by Sandy Weisberg (any edition).

Jun 22 '25 23:06 bwiernik