plass icon indicating copy to clipboard operation
plass copied to clipboard

PenguiN for not-metagenomic assembly

Open sav-che opened this issue 6 months ago • 2 comments

Hi! I liked the idea behind PenguiN, and out of interest I tested it on some clean and moderately contaminated genomes. The clean genome assembly seems to underperform: high redundancy and lack of the longest expected sequences. Do you think PenguiN can be possibly adapted to such task?

sav-che avatar Jul 28 '25 12:07 sav-che

I assume you tested it on prokaryotic genomes, not virus ones? Penguin cannot easily be adapted to prokaryotic genomes, unfortunately, because it cannot handle repeats that cannot be bridged by a read (so longer than ~200bp), and such repeats are all over the genomes of prokaryotes. We have started to work on a new assembler for metagenomics assembly of prokaryotic genomes, but that will take quite some time to finish.

soeding avatar Jul 28 '25 12:07 soeding

Thanks for the fast reply! Well, it's probably even worse - I tried on eukaryotes (fungi). It is just that we are constantly searching for better ways to assemble genomes from dirty-ish herbarium specimens that often need host-parasite-contaminant resolution and may be considered metagenomic.

sav-che avatar Jul 28 '25 13:07 sav-che