GSTools icon indicating copy to clipboard operation
GSTools copied to clipboard

Error in coefficient of determination of variogram fit.

Open donhalmina opened this issue 4 years ago • 2 comments

Hello

As per the code, you can estimate the coefficient of determination (r2) to compare the fit of theoretical covariance model with the experimental semivariogram.

para, pcov, r2 = fit_model.fit_variogram(bin_center, gamma, return_r2=True)

However, this is wrong as r2 is for linear regression and the covariance functions are not linear. This makes the r2 not credible. I suggest using a different goodness-of-fit criteria such as "standard error of regression" instead of r2.

Thanks.

donhalmina avatar Aug 24 '21 10:08 donhalmina

Hi, thanks for pointing that out, you are completely right. Would you be willing to implement a better criteria and create a PR?

LSchueler avatar Aug 24 '21 11:08 LSchueler

Hi there! Thanks for pointing this out. Since I implemented this, may give me the chance to provide my two cents:

A simple definition for the pseudo-R2 score can be given by: pseudo-r2 (https://timeseriesreasoning.com/contents/r-squared-adjusted-r-squared-pseudo-r-squared/) Where "D" stands for Deviance. Usually the pseudo-R2 (used for non-linear regressions) is used with Maximum Likelihood Estimation where the Deviance could be defined as the log-likelihood resulting in the McFadden R2 score.

If we just define the deviance as the sum of deviation squares, we result in the formula for the classical R2. In this context, the R2 score tells us, how much better the fitted model is compared to a nugget-model set to the mean of the estimated variogram values. I would argue that this information is quite useful but you are right, that we don't provide any justification for that although it is obviously not a linear regression. This could be a nice little research @LSchueler

Nonetheless we could provide other scores. "Standard error of regression" is a good start. The Standard error of regression is also very similar to the formula of the pseudo-R2 score shown above. Difference is only, that the sum of deviation squares is divided by the number of data points (and not the deviance from the mean) and you take the root to have the same unit as the input data and this means, the results from our example on this (link) should stay the same :wink:

Cheers, Sebastian

MuellerSeb avatar Aug 25 '21 16:08 MuellerSeb