Unary nodes aren't handled well
Dating accuracy is severely decreased with inferring tree sequences with unary nodes. We're probably not handling something correctly.
Unittests with simple examples of unary nodes may help.
We should test whether this is because the unary nodes created by tsinfer are rubbish, or because tsdate copes badly even with sensible unary nodes. The easiest way to do this is to run a an msprime simulation with record_full_arg=True, which should result in a load of unary nodes in the correct place.
Discovered a simple test which, I think, breaks our handling of unary nodes. It at least breaks the verify_weights() test.
def two_tree_ts_with_unary_n3():
r"""
Simple case where we have n = 3 and node 5 is internal unary, then
the oldest node and unary.
6 . 5
/ \ . |
4 5 . 4
| | . / \
3 | . 3 \
/ \ | . / \ \
[0] [1] [2] . [0] [1] [2]
"""
nodes = io.StringIO("""\
id is_sample time
0 1 0
1 1 0
2 1 0
3 0 1
4 0 2
5 0 3
6 0 4
""")
edges = io.StringIO("""\
left right parent child
0 2 3 0,1
0 1 5 2
0 2 4 3
0 1 6 4,5
1 2 4 2
1 2 5 4
""")
return tskit.load_text(nodes=nodes, edges=edges, strict=False)
I've included it in PR #141 as a skipped test
Just discussed this with @hyanwong: at a marginal tree when there is an edge with unary nodes, we assign a mutation uniquely to one edge within the lineage containing unary node. So when we simplify we probably do ok because we squash unary nodes and don't pretend we know where the mutations go. Perhaps we can't do better?
We know why this is (too much constraint), so closing.