validate and filter the connections with databases
Hello,
In the manual it mentioned an optional step of validating proposed connections by mapping to a public protein database. I think this is really a great idea, but I don't seem to find the script for doing that in the repository.
Is this function currently implemented or do the users have to come up with their own solutions?
Best Regards, Rongfeng Cui Max-Planck Institute for the Biology of Ageing.
Hi Rongfeng,
The feature is kind of there. To do this, you need to run "rascaf" with "-cs" option to output the sequence supporting the connection of two contigs.
As a result, you will have a file look like "*_cs.fa". Then you can blast those sequences against refseq_rna database. If you don't have the blast database locally, you can use the "-remote" option. For example: "blastn -db refseq_rna -task dc-megablast -query rascaf_cs.fa -out blast.out -outfmt 6 -remote -max_target_seqs 20" This procedure is quite slow and assumes you have blastn on your machine.
After that you can run the "EvaluateBlastResults.pl" script as: "perl ./EvaluateBlastResults.pl blast.out rascaf_cs.fa". The output will show the validation for the sequence from rascaf_cs.fa file. 1: valided. 0: uncertain. -1: misorientation -2: misorder -3: chimeric -4: no match in the blast database.
You can then adjust the corresponding connection in the rascaf.out file based on the validation result.
Thanks, Li
Dear Li,
Thank you for the quick reply!
For the last step, is there a way to automate the process? I could write a script for it in case you don't already have one.
Another question is why not use tblastn to make the alignment at the amino acid level?
Thanks Ray On Nov 28, 2016 8:58 PM, "Li Song" [email protected] wrote:
Hi Rongfeng,
The feature is kind of there. To do this, you need to run "rascaf" with "-cs" option to output the sequence supporting the connection of two contigs.
As a result, you will have a file look like "*_cs.fa". Then you can blast those sequences against refseq_rna database. If you don't have the blast database locally, you can use the "-remote" option. For example: "blastn -db nt -task dc-megablast -query rascaf_cs.fa -out blast.out -outfmt 6 -remote -max_target_seqs 20" This procedure is quite slow and assumes you have blastn on your machine.
After that you can run the "EvaluateBlastResults.pl" script as: "perl ./EvaluateBlastResults.pl blast.out rascaf_cs.fa". The output will show the validation for the sequence from rascaf_cs.fa file. 1: valided. 0: uncertain. -1: misorientation -2: misorder -3: chimeric -4: no match in the blast database.
You can then adjust the corresponding connection in the rascaf.out file based on the validation result.
Thanks, Li
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mourisl/Rascaf/issues/2#issuecomment-263377020, or mute the thread https://github.com/notifications/unsubscribe-auth/ABlYwTB6WLOeovbvvSrLm55e6blEQ6Ayks5rCzJ3gaJpZM4K9yGt .
Sorry I meant tblastx. On Nov 28, 2016 9:33 PM, "Ray Cui" [email protected] wrote:
Dear Li,
Thank you for the quick reply!
For the last step, is there a way to automate the process? I could write a script for it in case you don't already have one.
Another question is why not use tblastn to make the alignment at the amino acid level?
Thanks Ray On Nov 28, 2016 8:58 PM, "Li Song" [email protected] wrote:
Hi Rongfeng,
The feature is kind of there. To do this, you need to run "rascaf" with "-cs" option to output the sequence supporting the connection of two contigs.
As a result, you will have a file look like "*_cs.fa". Then you can blast those sequences against refseq_rna database. If you don't have the blast database locally, you can use the "-remote" option. For example: "blastn -db nt -task dc-megablast -query rascaf_cs.fa -out blast.out -outfmt 6 -remote -max_target_seqs 20" This procedure is quite slow and assumes you have blastn on your machine.
After that you can run the "EvaluateBlastResults.pl" script as: "perl ./EvaluateBlastResults.pl blast.out rascaf_cs.fa". The output will show the validation for the sequence from rascaf_cs.fa file. 1: valided. 0: uncertain. -1: misorientation -2: misorder -3: chimeric -4: no match in the blast database.
You can then adjust the corresponding connection in the rascaf.out file based on the validation result.
Thanks, Li
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mourisl/Rascaf/issues/2#issuecomment-263377020, or mute the thread https://github.com/notifications/unsubscribe-auth/ABlYwTB6WLOeovbvvSrLm55e6blEQ6Ayks5rCzJ3gaJpZM4K9yGt .
For the last step, I'm sorry we don't have the script to remove the connections from rascaf.out now. If you can wait for 1 day, I can implement it now (this is on my todo list anyway).
I think tblastx would work as long as the output is the same as blastn (-outfmt 6).
Thanks, Li
Hi Li,
no problem, please take your time. Let me know when this script is
ready.
If I want to use a protein database, like taxonomic specific
uniprot, is it also possible (with blastx -outfmt 6)? Best Regards, Ray
On Mon, Nov 28, 2016 at 9:45 PM, Li Song [email protected] wrote:
For the last step, I'm sorry we don't have the script to remove the connections from rascaf.out now. If you can wait for 1 day, I can implement it now (this is on my todo list anyway).
I think tblastx would work as long as the output is the same as blastn (-outfmt 6).
Thanks, Li
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mourisl/Rascaf/issues/2#issuecomment-263388955, or mute the thread https://github.com/notifications/unsubscribe-auth/ABlYwQmm2w5e4FakGKt0RJ55a1B9g0JYks5rCz2CgaJpZM4K9yGt .
I just tested blastx with one sequence and I believe the script will work fine with blastx.
Ok, thank you for testing!
Ray On Nov 28, 2016 10:26 PM, "Li Song" [email protected] wrote:
I just tested blastx with one sequence and I believe the script will work fine with blastx.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mourisl/Rascaf/issues/2#issuecomment-263399295, or mute the thread https://github.com/notifications/unsubscribe-auth/ABlYwUSjKUN_UczTq1RcLu22woQf-WTDks5rC0cBgaJpZM4K9yGt .
I just uploaded the script "FilterRascafConnection.pl". You can use it after obtaining the validation results from blast. The commands is like: "perl FilterRascafConnection.pl blast_eval.out rascaf_cs.fa rascaf.out > rascaf_filter.out" .
Please let me know whether this works.
Thanks, Li
Dear Li,
thank you for your fast response. I will test this script.
In your paper you have compared Rascaf to two other RNA scaffolders.
I am aware of an unpublished scaffolder called BESST_RNA. Out of curiosity I'm comparing results between them. So far it seems that BESST_RNA yielded a somewhat higher N50 (input N50: 815,196, BESST_RNA N50: 910,428, Rascaf N50: 848,421), and a slightly better BUSCO performance: (input complete BUSCOs: 4380, BESST_RNA: 4410, Rascaf: 4397). BESST_RNA requires merging of all bam files into a single input. Other than that I set a minimum link number to 4 for both programs. I'm wondering if you would have time to perform some systematic comparisons between the two regarding accuracy and performance.
Best Regards, Ray
On Tue, Nov 29, 2016 at 4:25 AM, Li Song [email protected] wrote:
I just uploaded the script "FilterRascafConnection.pl". You can use it after obtaining the validation results from blast. The commands is like: "perl FilterRascafConnection.pl blast_eval.out rascaf_cs.fa rascaf.out > rascaf_filter.out" .
Please let me know whether this works.
Thanks, Li
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mourisl/Rascaf/issues/2#issuecomment-263466183, or mute the thread https://github.com/notifications/unsubscribe-auth/ABlYwRQbMltGttxeXU1IsBA96NW-iom9ks5rC5s7gaJpZM4K9yGt .
Thanks for letting me know the comparison. I tried to run BESST_RNA, but it crashed. It is likely a problem about python on our sever. I will try it on a different computer later.
Dear Li,
thanks for looking at this.
Best Regards, Ray
On Tue, Nov 29, 2016 at 11:15 PM, Li Song [email protected] wrote:
Thanks for letting me know the comparison. I tried to run BESST_RNA, but it crashed. It is likely a problem about python on our sever. I will try it on a different computer later.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mourisl/Rascaf/issues/2#issuecomment-263717325, or mute the thread https://github.com/notifications/unsubscribe-auth/ABlYwfEQgBgqBJKRiSYHENJhFFeXpGAWks5rDKP3gaJpZM4K9yGt .