Allow historical samples in inside_outside method
Previously truncate_prior wasn't actually being called!
if np.any(tree_sequence.tables.nodes.time[tree_sequence.samples()] != 0):
if False: # This is never called
priors = truncate_priors(...)
I have been adding extra functionality to this PR - I guess that's fine @awohns ?!
We also need to implement tests for build_prior with the parameters allow_historical_samples, truncate_priors, and node_var_override. Also a test for the internal _truncate_priors function. But once this is done, and if performance is not any worse than before, I wonder if this PR should be squashed and merged into main, and then we can figure out the issue in https://github.com/hyanwong/molecular-sample-dating/blob/main/notebooks/uniform_prior.ipynb
@hyanwong, one of the changes in the inside pass has dramatically slowed it down, I'm not sure which one
@hyanwong, one of the changes in the inside pass has dramatically slowed it down, I'm not sure which one
OK, useful to know. I'll try to track it down.
Hmm, after a bit of testing on simulated tree sequences I'm not seeing a difference between the current version of tsdate and this PR. I think the inferred tree sequence I'm working with might have some strange features that really slow the inside step down (never seen this before though) rather than any of the changes in this PR
Yep, I had a look, and can't see anything that should have slowed it down.
It might be worth having a perf suite like msprime (asv I think it's called), but I suspect it might be a hassle to set up and run.
I've rebased this and run it on a reasonable-sized tree sequence and it doesn't seem to change the runtimes. I reckon we should think about merging it, and get a review from someone. The ability to include ancient samples is a big win, and we should get it in the next release.
The API for build_grid is changed to include the following:
:param string prior_distribution: What distribution to use to approximate the
conditional coalescent prior. Can be "lognorm" for the lognormal distribution
(generally a better fit, but slightly slower to calculate) or "gamma" for the
gamma distribution (slightly faster, but a poorer fit for recent nodes).
Default: "lognorm"
:param bool allow_historical_samples: should we allow historical samples (i.e. at
times > 0). This invalidates the assumptions of the conditional coalescent, but
may be acceptable if the historical samples are recent or if there are many
contemporaneous samples. Default: ``False``
:param bool truncate_priors: If there are historical samples, should we truncate the
priors of all nodes which are their ancestors so that the probability of being
younger than the oldest descendant sample is zero. As long as historical
samples do not have ancestors that have been misassigned in the tree sequence
topology, this should give better results. Default: ``True``
:param dict nonfixed_sample_var: is a dict mapping sample node IDs to a variance
value. Any nodes listed here will be treated as non-fixed nodes whose prior is
not calculated from the conditional coalescent but instead are allocated a prior
whose mean is the node time in the tree sequence and whose variance is the
value in this dictionary. This allows sample nodes to be treated as nonfixed
nodes, and therefore dated. If ``None`` (default) then all sample nodes are
treated as occurring at a fixed time (as if this were an empty dict).