ete icon indicating copy to clipboard operation
ete copied to clipboard

TreeKO comparison doesn't handle adjacent duplications

Open blasks opened this issue 9 months ago • 1 comments

Hello,

I found when calculating the TreeKO distance (PhyloTree.compare() method with has_duplications=True), the method raises an error if duplicate leaves are both children of the same parent node, but not if the leaves are separated by two or more nodes. This seems like a bug to me since it's definitely possible to have adjacently duplicated leaves in a gene tree with orthologs, for example.

import ete4

t1 = ete4.PhyloTree('((A,B),(A,C));')
t2 = ete4.PhyloTree('(A,(B,C));')
print(t1.compare(t2, has_duplications=True)) # works fine

t1 = ete4.PhyloTree('((A,A),(B,C));')
print(t1.compare(t2, has_duplications=True)) # raises 'TreeError: Duplicated items found in target tree.'

blasks avatar Apr 23 '25 00:04 blasks

Hi @blasks ,

Thanks for reporting this. I personally don't know about the inner workings of compare(), so I'm just going to nudge @jhcepas about it :)

jordibc avatar May 05 '25 23:05 jordibc