RaGOO icon indicating copy to clipboard operation
RaGOO copied to clipboard

Question about the confident scores

Open YPGG1234 opened this issue 6 years ago • 21 comments

Hello, I see you have confidence scores associated with the grouping, localization, and orientation for each contig, and I want to know more details about it. For example, I have a contig in the final fasta file, and I get it's location confidence scores = 0.03104861142651071 ,and It's orientation confidence scores = 0.9638021314266446 (This contig I think it should belong to chr Y <ref dosen't have chr Y> and should not be broken,but it is assembled to chr X, and It is broken ), I want to know these scores are good or bad? And I want to know how can I judge what scores are reliable? Here it's my command: ragoo.py -R raw.corrected.fasta -C -m /bin/minimap2 -gff stringtie.generated.gff3 -T corr -t 28 assembly.fa ref.fna If you can help me,I will be very grateful to you.

YPGG1234 avatar Sep 29 '19 15:09 YPGG1234

Hi there,

I am happy to give a detailed explanation. First, can you tell me what the grouping confidence score is? That will be in the "groupings" folder.

Thanks

malonge avatar Sep 29 '19 21:09 malonge

Hi, this contig's grouping confidence score is 0.9568822757353695.

YPGG1234 avatar Sep 30 '19 01:09 YPGG1234

Thanks for sharing this. I would say that the grouping and orientation scores look pretty good. The location score is low, though that is perhaps the least descriptive since it is based on alignment coordinates with respect to the reference.

In general, it is not optimal to use a reference which is missing chromosomes (in your case, Y). In that case, as long as a contig has a >10kbp alignment anywhere else in the genome, that is where it will get placed. Is it possible to use a reference with the Y chromosome?

If not, perhaps the next best thing to do would be to increase the specificity by requiring a minimum alignment length that is much longer than 10kbp. Though I would like to add this functionality at some point in the future, it is not currently available.

However, RaGOO will not try to remake alignment files if they are already present in the output directories. So you can filter those alignment files (for example, only include alignments > 50kbp) and place them in the output directories. If they have the same names as they do now, RaGOO will not recreate them. If that doesn't make sense, I can give a more detailed example.

malonge avatar Sep 30 '19 19:09 malonge

also, please see the preprint for a better description of the confidence scores:

https://www.biorxiv.org/content/10.1101/519637v1

malonge avatar Sep 30 '19 19:09 malonge

Thanks for your help, I will try it. This contig length is longer than 5Mb, but be broken at position 236K. And the first part is placed on chr13 ,second part is placed on chrX.

I have another question, you say if a contig has a >10kbp alignment anywhere else in the genome, that is where it will get placed , does this mean if a contig has lots of repeat contents (such as from sex chromosomes) , then it perhaps be wrong assembled to other chromosomes (or another sex chromosome) ? And maybe occured in many places in the final fasta files ?

YPGG1234 avatar Oct 01 '19 03:10 YPGG1234

No that is not what I mean. Allow me to clarify.

By default, each contig is placed exactly once, unaltered, in the final ragoo.fasta file. So the final file represents just an ordered and oriented version of the input contig set.

Beyond that, one can correct misassemblies as you have, but that just breaks contigs in certain places. So If a repetitive contig has many alignments, ragoo will pick the "best" alignment to use. However, that is exactly the sort of thing that would make the confidence scores go down.

malonge avatar Oct 01 '19 13:10 malonge

Ok, I understand. Thanks for your answer !

YPGG1234 avatar Oct 01 '19 13:10 YPGG1234

No problem. I will respond again to this issue when I have made the alignment length a tunable parameter.

malonge avatar Oct 01 '19 13:10 malonge

Hi malonge,

Recently I meet some new problems.When I used assembly‘s scaffolds and reference genome to draw CIRCOS,It looks pretty,but when I used Ragoo assembly and reference to draw CIRCOS. It looks even messy.

image

Here it's my ragoo's command: ragoo.py -R raw.corrected.fasta -m /bin/minimap2 -gff stringtie.generated.gff3 -T corr -t 28 -i 0.8 -j Y.candidate.txt assembly.fa ref.fna

For the previous one,I used lastal to generate link.txt, for the last one,I used minimap to generate link.txt. I am not sure it's my ragoo assembly has some problems or it's just my alignment tools has some problems.

Can you help me?Thanks.

YPGG1234 avatar Oct 10 '19 07:10 YPGG1234

Hi there,

Can you tell me what exactly is in the link.txt file?

Personally, I think a dotplot would be the best visualization here. You can use mummerplot or assemblytics.

malonge avatar Oct 10 '19 21:10 malonge

OK, link.txt is an input file required by CIRCOS to draw collinearity graph.It records the collinearity relation between assembly and reference, and the format is as follows: QueryChr/ScaffoldName QueryChr/ScaffoldStart QueryChr/ScaffoldEnd RefChr RefChrStart RefChrEnd Scaffold_1 0 100000 Chr2 50000 150000

It can generated from lastal , minimap2 and such alignment tools.

YPGG1234 avatar Oct 11 '19 01:10 YPGG1234

It sounds like you used 2 different aligners to generate the plots. Can you show me what they look like if you use minimap2 for both of them? Also, what does your minimap2 command look like?

RaGOO scaffolds strictly based on minimap2 alignments, so it doesn't make sense that they would disagree that much.

malonge avatar Oct 11 '19 15:10 malonge

My contigs_against_ref.paf.log contain this minimap2 command: minimap2 -k19 -w19 -t24 ref.fa assembly.fa

So my minimap2 command is : minimap2 -k19 -w19 -t 24 --secondary=no -cx asm10 ref.fa assembly.fa

I think it is possible that I opened the parameter "assembly correction", which led to the scaffold being broken.But when I ran RaGOO without any parameters, the results I drew still didn't change. My colleague told me lastal may better than minimap2 in this case,I will try it.

YPGG1234 avatar Oct 11 '19 15:10 YPGG1234

What organism is this? And what is the expected genome size/ploidy?

malonge avatar Oct 11 '19 16:10 malonge

The organism is sheep and expected genome size is 2.6-2.7G just like the reference genome

Peng-Y3 avatar Oct 11 '19 16:10 Peng-Y3

Well I am puzzled because those two minimap2 commands should give very similar results. And I don't see why minimap2 would not work just fine on this genome.

One thing you can do is replace the original contigs_against_ref.paf with the PAF file used to generate the circos plot. Let's say you have circos.paf. You can do the following.

cd ragoo_output
mv contigs_against_ref.paf contigs_against_ref.paf.old
cp /path/to/circos/circos.paf .
mv circos.paf contigs_against_ref.paf

Then, remove every other file/directory in ragoo_output except those paf files (you can keep the log file around too). Finally, rerun ragoo.

Ragoo will use your circos alignments for scaffolding instead of generating its own alignments.

malonge avatar Oct 11 '19 16:10 malonge

Of course, you would have to rerun minimap2 on the original scaffolds rather than the ragoo pseudomolecules

malonge avatar Oct 11 '19 17:10 malonge

I wonder if I can modify the RaGOO's built-in minimap2 parameter, where should I change it? Such as I want to change built-in "minimap2 -k19 -w19 -t24 ref.fa assembly.fa" to "minimap2 -k19 -w19 -t 24 --secondary=no -cx asm10 ref.fa assembly.fa".

YPGG1234 avatar Oct 11 '19 17:10 YPGG1234

Well you can fork the repo and change it in the source code by all means, but I was just suggesting how to run whatever minimap2 command you want, save it to a paf file, then just plug that paf file into ragoo. Ragoo won't make a new paf file if it already sees one there.

malonge avatar Oct 11 '19 17:10 malonge

Ok, I will try it, thank you.

YPGG1234 avatar Oct 11 '19 17:10 YPGG1234

Hi there,

RagTag, the successor to RaGOO, is now available here:

https://github.com/malonge/RagTag

This feature is implemented in RagTag, and will likely not ever be implemented in RaGOO, which will eventually be deprecated.

Thanks

malonge avatar Jun 09 '20 20:06 malonge