3d-dna icon indicating copy to clipboard operation
3d-dna copied to clipboard

Assembly review in Juicebox too slow - too many contigs in file 'final.assembly'?

Open ggstatgen opened this issue 7 years ago • 5 comments

Hi,

I've completed a run of 3d-dna using as input a canu raw assembly obtained from 25x WGS mouse data and corresponding Hi-C.

I am able to load the final.hic file into Juicebox without any issues. When I try loading the FINAL.assembly file, though, the program slows down considerably. It does show me the full map with the annotated yellow and blue scaffolds. However, the interface slows down to a crawl and I find it difficult to manually adjust the results following the methods you recommend in the cookbook.

I suspect the above is due to a very fragmented initial assembly, and the presence of several small size scaffold. You mention in the tutorial that 'tiny scaffolds are not processed further and are simply concatenated to the final output fast' without modification'.

If I look into my FINAL.fasta I count 5,242 scaffolds, whereas my raw Canu output fasta (the input to the 3d-dna pipeline) contains 8196 contigs.

Could the above be the reason why Juicebox is so slow? Is there anyway to ameliorate the above? Any suggestions welcome.

ggstatgen avatar Feb 21 '19 11:02 ggstatgen

Hi ggstatgen,

5,242 does not sound very bad to me. Most likely is that this is not an issue with .assembly, but rather with the build.

If you load the test.assembly and test.hic from tutorial (see links in comments here: https://www.youtube.com/watch?v=Nj7RhQZHM18), is it also slow?

If yes, check that you are using the latest version. (We know some people have experienced a slowdown due to old Ant we used in one of the earlier JBAT builds.) Another option is to clone into IntelliJ or something like that and run from dev environment.

A note on this, .final and .FINAL are actually different suffixes: the first one does not include gaps between individual original scaffolds, and the latter does. Just make sure you load .hic and .assembly with the same suffix.

Best, Olga

dudcha avatar Feb 21 '19 19:02 dudcha

Hi Olga thanks for this

I will test using your sample data and report back.

Thanks for the advice regarding the final vs FINAL suffixes. One question I have is: the pipeline has not produced a FINAL.hic but only a final.hic. Did something go wrong?

ggstatgen avatar Feb 22 '19 11:02 ggstatgen

Hi ggstatgen,

It's not done by default but there is a flag --build-gapped-map if you want to build a hic map with gaps. That said, you don't want to review the map with gaps: recommended is that you review the .final (=.rawchrom for haploid mode) and then run the review that will add gaps in your polished assembly.

Best, Olga

dudcha avatar Feb 22 '19 18:02 dudcha

Hi again Olga,

So I tried opening the demo files as you suggested and the tool is still extremely slow. Definitely slower than what you show in the youtube tutorial.

For reference, the hic file loads without any problems. However the assembly file takes ages to load. The demo assembly file I'm using contains the following

>pseudochr1_scaf1 1 105531482
>pseudochr1_scaf2 2 93243791
>pseudochr2 3 227100921
1 2
3

The above takes approx 3 min to load on the computer. Any ideas?

A bit of info to hopefully help diagnose the issue Computer: AMD Ryzen 5 PRO 1500 octa-core with 32GB RAM Juicebox versions tested: Juicebox_1.8.8.jar and Juicebox_1.9.8.jar Launch command: java -jar -Xmx15000m Juicebox_1.9.8.jar Java version: 64bit "1.8.0_201

Any suggestions appreciated!

ggstatgen avatar Mar 01 '19 14:03 ggstatgen

Ggstagen,

Please follow the suggestion from my original reply: set up IntelliJ or any other Java dev environment, clone JB git and build an app locally or run from dev environment.

Thanks, Olga

On Mar 1, 2019, at 8:44 AM, ggstatgen [email protected] wrote:

Hi again Olga,

So I tried opening the demo files as you suggested and the tool is stil extremely slow. Definitely slower than what you show in the youtube tutorial.

For reference, the hic file loads without any problems. However the assembly file takes ages to load. The demo assembly file I'm using contains the following

pseudochr1_scaf1 1 105531482 pseudochr1_scaf2 2 93243791 pseudochr2 3 227100921 1 2 3 The above takes approx 3 min to load on the computer. Any ideas?

A bit of info to hopefully help diagnose the issue Computer: AMD Ryzen 5 PRO 1500 octa-core with 32GB RAM Juicebox versions tested: Juicebox_1.8.8.jar and Juicebox_1.9.8.jar Launch command: java -jar -Xmx15000m Juicebox_1.9.8.jar Java version: 64bit "1.8.0_201

Any suggestions appreciated!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

dudcha avatar Mar 01 '19 14:03 dudcha