emcee stochastic calculations inside emcee

Hi, I have a model for which I am using emcee to sample three parameters. I have trouble getting good acceptance ratio. I have increased the number of walkers to 2000 and still have the same problem. Funny enough, the first few samples, i.e., first 100 or so, have acceptance ratio around 25-30 % but then it drops to 0.01%. I am initializing the walkers with random values. Is there any diagnostic or something else one can do to improve the sampling?

thanks in advance

Inti

Jul 09 '14 10:07 inti

HI Inti! You can try to initialize walkers near maximum likelihood values of the parameters with emcee.utils.sampling_ball. If your posterior density of the parameters has many modes then try parallel tapered ensemble sampler - emcee.PTSampler.

Ilya

P.S. Worth checking the code of your lnprob. Some errors (like absence of squared in normal pdf) can cause strange results :) P.P.S. If the problem won't disappear - could you paste your code here, we'll try figure out where is the problem.

Jul 09 '14 17:07 ipashchenko

Hey! thanks, i'll try with the initialisation to see if that improves. cheers, Inti

On 9 July 2014 18:54, Ilya [email protected] wrote:

HI Inti! You can try to initialize workers near maximum likelihood values of the parameters with emcee.utils.sampling_ball. If your posterior density of the parameters has many modes then try parallel tapered ensemble sampler - emcee.PTSampler.

Ilya

— Reply to this email directly or view it on GitHub https://github.com/dfm/emcee/issues/121#issuecomment-48510220.

Jul 09 '14 18:07 inti

In principle, if the dimension of your problem is large (many parameters) you should reduce the size of the step size in the ensemble move. It is the "a" parameter (not sure off the top of my head what the argument is)

Jul 13 '14 12:07 davidwhogg

Hi, i am only using emcee to sample three parameters but the model has many more. Most are updated using the conditional expectations. When you refer to many parameters you probably mean the ones sampled with emcee, isn't? Or you mean more generally ALL parameters in the model.

I'll look for the a paremeter. Cheers, Inti

On Sunday, 13 July 2014, David W. Hogg [email protected] wrote:

In principle, if the dimension of your problem is large (many parameters) you should reduce the size of the step size in the ensemble move. It is the "a" parameter (not sure off the top of my head what the argument is)

— Reply to this email directly or view it on GitHub https://github.com/dfm/emcee/issues/121#issuecomment-48839153.

Jul 13 '14 13:07 inti

If there are other parameters not being sampled by emcee, then I presume you are fixing them within the sampling to the same value for all walkers? Anything else would be not permitted.

Jul 14 '14 11:07 davidwhogg

Hi, Thanks a lot for your comments. I'll try to explain a bit more, I got a large regression with many predictors,10^5 or 10^6. I am using a spike-and-slab prior for the regression coefficients and sampling with emcee a parameter which tells me what fraction of the predictors should be included on the model. The coefficients themselves are marginalised out. Some of other parameters like the variance of the error term are updated to their conditional expectations (actually from the distribution defined by the conditional expectations).

So, when I sample three parameter values ALL other parameters are calculated, either in exact manner or sampler using conditional expectations and ALL THESE parameters are used to calculate the likelihood.

The actual model I am implementing is described here http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3567190/ the details of the parameters are here (BE AWARE: the link will download PDF automatically) http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3567190/bin/pgen.1003264.s009.pdf The original paper used MH and I was interested to see if I could implemented using emcee and then extend the model (with additional parameters).

I understand from your comment that this would not work with emcee? Did I explain myself?

Jul 14 '14 15:07 inti

There shouldn't be any problem doing this with emcee. Just be careful: if you're computing the internal expectations stochastically, that can lead to problems. You need to make sure that every time you come back to the same part of parameter space you get exactly the same probability value. I normally do this by setting a seed up front or assigning a list of random numbers at the beginning of the analysis and just reusing these every time. Otherwise, it sounds like there is no problem!

Jul 15 '14 07:07 dfm

I get it and I know what you mean with "You need to make sure that every time you come back to the same part of parameter space you get exactly the same probability value". I am obviously doing it wrong then. I am just unsure how to implement it in the context of emcee and how you use the seed/random numbers for this. I wonder if I can get more information from you on the random numbers part. Are you referring to the random state of the walkers, i.e., the rstate0 argument to emcee.EnsembleSampler.run_mcmc function, or are you referring to assign seed numbers to the walkers?

Do you have example code of this?

Thanks a lot!

Jul 15 '14 12:07 inti

Really, Dan, could you give an example to emcee users somewhere? That issue with stochastic calculations inside emcee is a bit confusing... Thanks!

Jul 24 '14 09:07 ipashchenko

Hi, @dfm do you have some code at hand where to find examples of setting the random number generator inside emcee to enable the stochastic calculations? thanks in advance

May 09 '15 02:05 inti