RagTag icon indicating copy to clipboard operation
RagTag copied to clipboard

How does patch handle sequence names?

Open FGoettelmann opened this issue 3 years ago • 1 comments

Hi,

First thanks for this great tool.

I see that there are a lot of people having trouble understanding how patch works, I read the wiki and the FAQ about it and issue #96 was very helpful, but I still have a problem.

I have a fragmented genome assembly and a chromosome-level one, and I would like to use the latter to get a chromosome-level version of the former. I already used scaffold to produce such an assembly and it looks great but I would like to polish it a bit more by filling the gaps with patch.

I ran patch using the scaffolded assembly as target and the choromosome-level assembly as query, and it looks like it all worked, but I lost all my sequence names. Previously they were "chr1", "chr2", etc, or "chr1_RagTag", etc, and now they have generic names like "scf0000001" and are out of order. I can find out which are the chromosomes based on their length, but for all the other scaffolds it would be difficult and time-consuming. Am I missing a particular option that resulted in the names being lost ?

FGoettelmann avatar Jun 13 '22 09:06 FGoettelmann

I have the same question. It occurs to me that, if you are running ragtag patch with the --fill-only option, then all sequences in the output should have the same lengths as in the set of input scaffolds. Assuming that's true, there should only be ambiguity if two or more contigs share the same length. Better would be if there were an option to skip the renaming step. Guessing that's not forthcoming, though, given the date stamp on the original issue.

adadiehl avatar Jul 23 '24 18:07 adadiehl