tsdate icon indicating copy to clipboard operation
tsdate copied to clipboard

Unary nodes aren't handled well

Open awohns opened this issue 5 years ago • 3 comments

Dating accuracy is severely decreased with inferring tree sequences with unary nodes. We're probably not handling something correctly.

Unittests with simple examples of unary nodes may help.

awohns avatar Feb 26 '20 18:02 awohns

We should test whether this is because the unary nodes created by tsinfer are rubbish, or because tsdate copes badly even with sensible unary nodes. The easiest way to do this is to run a an msprime simulation with record_full_arg=True, which should result in a load of unary nodes in the correct place.

hyanwong avatar Mar 04 '20 10:03 hyanwong

Discovered a simple test which, I think, breaks our handling of unary nodes. It at least breaks the verify_weights() test.

def two_tree_ts_with_unary_n3():
    r"""
    Simple case where we have n = 3 and node 5 is internal unary, then
    the oldest node and unary.
             6        .      5
           /   \      .      |
          4     5     .      4
          |     |     .     /  \
          3     |     .    3    \
         / \    |     .   / \    \
       [0] [1] [2]    . [0]  [1] [2]    

    """
    nodes = io.StringIO("""\
    id      is_sample   time
    0       1           0
    1       1           0
    2       1           0
    3       0           1
    4       0           2
    5       0           3
    6       0           4
    """)
    edges = io.StringIO("""\
    left    right   parent  child
    0       2       3       0,1
    0       1       5       2
    0       2       4       3
    0       1       6       4,5
    1       2       4       2
    1       2       5       4
    """)
    return tskit.load_text(nodes=nodes, edges=edges, strict=False)

I've included it in PR #141 as a skipped test

awohns avatar Nov 26 '20 05:11 awohns

Just discussed this with @hyanwong: at a marginal tree when there is an edge with unary nodes, we assign a mutation uniquely to one edge within the lineage containing unary node. So when we simplify we probably do ok because we squash unary nodes and don't pretend we know where the mutations go. Perhaps we can't do better?

awohns avatar Feb 12 '21 15:02 awohns

We know why this is (too much constraint), so closing.

hyanwong avatar Mar 29 '25 20:03 hyanwong