gp_periodic_cov() unexpected output for fixed period
Good day,
I was experimenting with gp_periodic_cov() yesterday after getting some unexpected results in a model. I noticed that when I provide an integer for the period parameter I get different results than if I provide the same value as a real. For both Stan programs, I use the following R code to sample:
library(cmdstanr)
options(mc.cores = 4)
set.seed(1)
mod <- cmdstan_model("mod.stan")
N <- 20
x <- rlnorm(N)
fit <- mod$sample(list(N = N, x = x), seed = 1)
In the following Stan program, period is given as 4:
data {
int N;
array[N] real x;
}
parameters {
real<lower=0> sigma, ell;
}
transformed parameters {
matrix[N, N] K = gp_periodic_cov(x, sigma, ell, 4);
}
model {
sigma ~ exponential(1);
ell ~ exponential(1);
}
I get:
> fit$summary("K")
# A tibble: 400 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 K[1,1] 1.92 0.464 4.18 0.665 0.00239 8.97 1.00 1662. 1186.
2 K[2,1] 1.92 0.464 4.18 0.665 0.00239 8.97 1.00 1662. 1186.
3 K[3,1] 1.92 0.464 4.18 0.665 0.00239 8.97 1.00 1662. 1186.
4 K[4,1] 1.92 0.464 4.18 0.665 0.00239 8.97 1.00 1662. 1186.
5 K[5,1] 1.92 0.464 4.18 0.665 0.00239 8.97 1.00 1662. 1186.
6 K[6,1] 1.92 0.464 4.18 0.665 0.00239 8.97 1.00 1662. 1186.
7 K[7,1] 1.92 0.464 4.18 0.665 0.00239 8.97 1.00 1662. 1186.
8 K[8,1] 1.92 0.464 4.18 0.665 0.00239 8.97 1.00 1662. 1186.
9 K[9,1] 1.92 0.464 4.18 0.665 0.00239 8.97 1.00 1662. 1186.
10 K[10,1] 1.92 0.464 4.18 0.665 0.00239 8.97 1.00 1662. 1186.
When instead I provide period as 4.0:
data {
int N;
array[N] real x;
}
parameters {
real<lower=0> sigma, ell;
}
transformed parameters {
matrix[N, N] K = gp_periodic_cov(x, sigma, ell, 4.0);
}
model {
sigma ~ exponential(1);
ell ~ exponential(1);
}
I get:
> fit$summary("K")
# A tibble: 400 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 K[1,1] 1.92 0.464 4.18 0.665 2.39e- 3 8.97 1.00 1662. 1186.
2 K[2,1] 0.629 0.0250 2.43 0.0371 1.57e- 76 3.00 1.00 1368. 1647.
3 K[3,1] 0.720 0.0427 2.57 0.0634 2.67e- 56 3.46 1.00 1372. 1632.
4 K[4,1] 1.58 0.332 3.77 0.487 1.21e- 4 7.22 1.00 1586. 1539.
5 K[5,1] 1.39 0.250 3.50 0.370 6.89e- 7 6.27 1.00 1508. 1627.
6 K[6,1] 0.679 0.0342 2.51 0.0508 1.14e- 64 3.27 1.00 1367. 1647.
7 K[7,1] 1.01 0.115 2.98 0.171 1.24e- 21 4.77 1.00 1407. 1621.
8 K[8,1] 1.03 0.124 3.01 0.183 2.50e- 20 4.81 1.00 1409. 1621.
9 K[9,1] 0.530 0.0112 2.27 0.0166 5.69e-108 2.56 1.00 1374. 1647.
10 K[10,1] 1.49 0.287 3.63 0.422 1.31e- 5 6.79 1.00 1542. 1530.
I suspect it has to do with ints vs reals.
Cheers,
Matt
There is this in the code:
T_p pi_div_p = pi() / p;
If T_p is deduced as int, then this will also be an int and you'll get truncation. I think we've had at least one other bug like this but I can't remember exactly how we fixed it -- thoughts @SteveBronder?
I think making pi_div_p have type auto might be enough. I'm sure there are other places where this would be an issue, though, so maybe we should just make the compiler always generate int -> real promotions manually?
This is my fault, if no one's done this yet, I should fix it...
Users of the Stan language won't run into this starting in the next version, but we should probably still fix it for users of the C++ lib directly. A similar pattern can be found in a lot of cases, not just this gp function
Thanks!!
Hold on this isn't even a valid GP model. I'm looking into it, I'm still setting up. I'll get back with you.
@mhollanders I would simulate a sine wave, add noise, and then modify the following R and Stan code that: A) only simulates a sine wave with noise. B) only uses a periodic covariance function (not the additive model as seen in the example). I'll come back to this. Thank you.
@mhollanders this is the post I'm referring to: https://likelyllc.com/2022/08/03/additive-gaussian-process-time-series-regression-in-stan/
@mhollanders I'm monitoring this. Did you try simulating a periodic function and then fitting a valid GP model? Since the model isn't valid, I might try simulating a function (you can still use a periodic function) and using squared exponential. Less degrees of freedom. You can use the post I linked above. Feel free to ping me if you have any questions.
If there's any issue on my part, I'll fix it.
Hey, I've since started ensuring the period argument is a real and no more issues. I trust it'll be resolved in future versions of Stan.
Awesome, great to hear.