Advice. Use only chromosomes as reference or the entire assembly?
Hi!
I'm trying to us ntJoin to scaffold an input PacBio CLR genome assembly using a chromosome level assembly of a closely related species as a refence.
-
I would like to ask what's best? Use the entire assembly or only the "chromosomes" since they comprise more than 90% of the genome size?
-
A different question.
- First, I tried to run ntJoin with
no_cut=True. This run yielded an assembly twice bigger than expected. - Then I tried
no_cut=Falseand it greatly improved the result, so that only 94% of the target assembly was assigned to reference assembly. And the ntJoin assembly size matched quite well the known size of the input genome.
Thanks in advance! Thanks for the software!
Hi @V-JJ,
-
When deciding on what to supply for the reference, either option is totally fine (and probably depends a bit upon what you're hoping to achieve) - but assuming that the sequences other than the chromosomes are 'unassigned', it is generally safe to keep those in. Likely, they won't contribute much to the scaffolding but shouldn't be too detrimental.
-
When running
no_cut=True, an inflated genome size can be due to a larger number of N's introduced. This can be offset by supplying theGparameter, which puts a maximum size on the introduced gaps. I have a longer explanation of how a large number of N's can be introduced, and why I implemented theGfeature in this previous issue: https://github.com/bcgsc/ntJoin/issues/115#issuecomment-2313102451 As you probably know, the difference is just thatno_cut=Truewill not break any of your existing contigs, whereasno_cut=Falsewill make breaks in your input contigs to fit to the reference. Which mode makes the most sense depends on how closely related the reference is, and your knowledge of the similarity of the genomes.
I hope that helps - thank you for your interest in ntJoin! Lauren
Hi @lcoombe !
Thanks for clear and detailed explanation.
Both species have a divergence of ~5 Mya and the BUSCO scores are quite similar when comparing nocut=True vs nocut=False, although a bit higher with (nocut=True).
Thanks, Vadim