A Smarter Autocorrelation Time
Hi all,
So I have a (set of) problems with long (but highly variable) burn-in times that are not known apriori. Minimization during the burn-in phase can get you close to a burned-in state, but whether one is burned in or not is not reliably known.
It appears that if you are not burned in and run acor over the full length of the chain, there is an extremely high probability of failure unless c is very small (I've found that often only c = 1 works reliably). A simple way to avoid this issue is to only calculate acor beyond some "burned in" step, which I've done here in a modified get_autocorr_time (via a new parameter min_step):
def get_autocorr_time(self, min_step=0, chain=[], **kwargs):
"""Return a matrix of autocorrelation lengths.
Returns a matrix of autocorrelation lengths for each
parameter in each temperature of shape ``(Ntemps, Ndim)``.
Any arguments will be passed to :func:`autocorr.integrate_time`.
"""
acors = np.zeros((self.ntemps, self.dim))
for i in range(self.ntemps):
if len(chain):
x = np.mean(chain[i, :, min_step:, :], axis=0)
else:
x = np.mean(self._chain[i, :, min_step:, :], axis=0)
acors[i, :] = emcee.autocorr.integrated_time(x, **kwargs)
return acors
But what value should one choose for min_step? (If we knew, we'd know when we were burned in!) It seems to solve this problem, one would like to be able to run acor with c set as large as possible, which could be achieved by setting min_step to some value greater than 0. If the chain was truly "burned in" at emi = 5000, and the chain is 10000 elements long, then the returned acor timescale would ideally be only for iterations 5000 through 10000.
I think a "smarter" acor could achieve this by sub-dividing the chain in powers of 2, trying all c values up to a maximum c, and then returning the acor time for the chain fragment for which c set to the largest value evaluated successfully. In some code I'm writing the logic currently looks like this:
aacort = -1.0
aa = 0
ams = 0
acorc = 10
for bdenom in [2 ** x for x in range(0, 5)]:
for a in range(1, acorc):
ms = emi - round(float(emi) / bdenom)
if ms >= emi - low:
break
try:
acorts = sampler.get_autocorr_time(
chain=cur_chain, low=low, c=a,
min_step=ms, fast=True)
acort = max([
max(x)
for x in acorts
])
except AutocorrError:
break
else:
if a > aa:
aa = a
aacort = acort
ams = ms
if aa == acorc:
break
Thoughts on this proposal? As many of us acknowledge, very few chains are actually "burned in," but this would potentially yield of a way robustly determining when they are.
Hi,
I think a smarter acor is a good idea. I have more datasets and when I run MCMC for them in a loop, for some of them I can compute autocorrelation time with c = 10, but for some with c = 1, and all the values in between.