tsdate Allow historical samples in inside

Previously truncate_prior wasn't actually being called!

if np.any(tree_sequence.tables.nodes.time[tree_sequence.samples()] != 0):
        if False:  # This is never called
            priors = truncate_priors(...)

Aug 19 '22 09:08 hyanwong

I have been adding extra functionality to this PR - I guess that's fine @awohns ?!

Aug 19 '22 12:08 hyanwong

We also need to implement tests for build_prior with the parameters allow_historical_samples, truncate_priors, and node_var_override. Also a test for the internal _truncate_priors function. But once this is done, and if performance is not any worse than before, I wonder if this PR should be squashed and merged into main, and then we can figure out the issue in https://github.com/hyanwong/molecular-sample-dating/blob/main/notebooks/uniform_prior.ipynb

Sep 06 '22 10:09 hyanwong

@hyanwong, one of the changes in the inside pass has dramatically slowed it down, I'm not sure which one

Sep 15 '22 17:09 awohns

@hyanwong, one of the changes in the inside pass has dramatically slowed it down, I'm not sure which one

OK, useful to know. I'll try to track it down.

Sep 15 '22 17:09 hyanwong

Hmm, after a bit of testing on simulated tree sequences I'm not seeing a difference between the current version of tsdate and this PR. I think the inferred tree sequence I'm working with might have some strange features that really slow the inside step down (never seen this before though) rather than any of the changes in this PR

Sep 15 '22 18:09 awohns

Yep, I had a look, and can't see anything that should have slowed it down.

It might be worth having a perf suite like msprime (asv I think it's called), but I suspect it might be a hassle to set up and run.

Sep 15 '22 21:09 hyanwong

I've rebased this and run it on a reasonable-sized tree sequence and it doesn't seem to change the runtimes. I reckon we should think about merging it, and get a review from someone. The ability to include ancient samples is a big win, and we should get it in the next release.

Jan 09 '23 16:01 hyanwong

The API for build_grid is changed to include the following:

:param string prior_distribution: What distribution to use to approximate the
         conditional coalescent prior. Can be "lognorm" for the lognormal distribution
         (generally a better fit, but slightly slower to calculate) or "gamma" for the
         gamma distribution (slightly faster, but a poorer fit for recent nodes).
         Default: "lognorm"
     :param bool allow_historical_samples: should we allow historical samples (i.e. at
         times > 0). This invalidates the assumptions of the conditional coalescent, but
         may be acceptable if the historical samples are recent or if there are many
         contemporaneous samples. Default: ``False``
     :param bool truncate_priors: If there are historical samples, should we truncate the
         priors of all nodes which are their ancestors so that the probability of being
         younger than the oldest descendant sample is zero. As long as historical
         samples do not have ancestors that have been misassigned in the tree sequence
         topology, this should give better results. Default: ``True``
     :param dict nonfixed_sample_var: is a dict mapping sample node IDs to a variance
         value. Any nodes listed here will be treated as non-fixed nodes whose prior is
         not calculated from the conditional coalescent but instead are allocated a prior
         whose mean is the node time in the tree sequence and whose variance is the
         value in this dictionary. This allows sample nodes to be treated as nonfixed
         nodes, and therefore dated. If ``None`` (default) then all sample nodes are
         treated as occurring at a fixed time (as if this were an empty dict).

Jul 11 '23 21:07 hyanwong

Allow historical samples in inside_outside method