Output contig locations bed file
This is a feature request - as of v1.1 the order of the contigs in each pseudomolecule is given in a separate TSV file. I think it would be more useful if instead (or in addition), the software will output a bed file that contains contig locations on pseudomolecules (thus also taking into account gap padding). In fact, two bed files could be created: one using coordinates of the generated pseudomolecules and the other using coordinates of the reference pseudomolecules.
Meanwhile - can you suggest a way to extract the reference locations information? It is not always trivial to extract it from the paf file...
Hi there,
You are right - that would be a better intermediate output format and that is the plan for v2 of RaGOO. Thanks for the suggestion.
In the meanwhile, you can use the script ragoo_utilities/get_contig_borders.py. It's rough, but hopefully, it will serve you until v2 comes out.
Thank you
Thanks. Can this script also extract mapping coordinates on the reference? If not, how can I extract them from the PAF? How should I treat cases of incomplete mapping and/or multiple mappings?
Hi there,
Do you mean all of the mapping coordinates between a query sequence and the reference? Or just those alignments that informed scaffolding somehow?
Is there a way to tell which alignments that informed scaffolding?
Unfortunately, there is no automatic way to obtain such alignments from RaGOO. I would encourage you to read the paper to see which alignments are used for which steps. There are certain steps that rely on the longest alignment between a contig and its assigned reference sequence, and those might be somewhat straightforward to pull out yourself.
Thanks - I'll give it a try. Could be a nice addition to the output (as requested in my original post)