Add new module vgan/haplocart
PR checklist
New Module vgan/Haplocart
Closes #2152
- [ X] This comment contains a description of changes (with reason).
- [ X] If you've fixed a bug or added code that should be tested, add tests!
- [ X] If you've added a new tool - have you followed the module conventions in the contribution docs
- [X ] If necessary, include test data in your PR.
- [X ] Remove all TODO statements.
- [X ] Emit the
versions.ymlfile. - [ X] Follow the naming conventions.
- [ X] Follow the parameters requirements.
- [X ] Follow the input/output options guidelines.
- [X ] Add a resource
label - [ X] Use BioConda and BioContainers if possible to fulfil software requirements.
- Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
- [ ]
PROFILE=docker pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware - [ X]
PROFILE=singularity pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware - [ ]
PROFILE=conda pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
- [ ]
@jfy133 Requesting review
Did you create this module using the nf-core tooling? A lot of the template code is missing (especially the code that implements the CI tests here)? :)
Ok thank you both for the help and suggestions! I will keep working on this.
@JoshuaDanielRubin just a reminder ab out this ;) we are almost ready in the eager3 development for haplocart to be integrated!
Hi @jfy133, sorry for the delay here, to be honest I got stuck with the tests failing. And I am a bit behind with PhD work unfortunately at cannot really devote much time to this, so perhaps it would be best to go on without me :) I presume we can always integrate HaploCart at the next eager release?
No worries at all! Yes we can bump to the next major release, however just shout if you want me to fix the tests for this PR (to make it less annoying;))
Hi @JoshuaDanielRubin! We are working on integrating haplocart in eager3 and I will be finishing up the module
Hi @JoshuaDanielRubin, I think the module is all ready but test are failing and since I am not familiar with Haplocart I will need your input.
When I do the test for single run using the file rCRS_simulated_test.fq.gz, I get the following error:
╦ ╦┌─┐┌─┐┬ ┌─┐╔═╗┌─┐┬─┐┌┬┐
╠═╣├─┤├─┘│ │ │║ ├─┤├┬┘ │
╩ ╩┴ ┴┴ ┴─┘└─┘╚═╝┴ ┴┴└─ ┴
Predicting sample: rCRS_simulated_test.fq.gz
Using 2 threads
Processing sample 1 of 1
Mapping reads...
Reading GAM file XKz8KOQ
Done reading GAM file /tmp/XKz8KOQ
Found 0 reads.
terminate called after throwing an instance of 'std::runtime_error'
what(): [HaploCart] Error, no reads mapped
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Stack trace path: /tmp/vg_crash_NU38XH/stacktrace.txt
Please include the stack trace file in your bug report!
The command that run seems correct to me:
#!/bin/bash -ue
vgan haplocart \
\
-t 2 \
-fq1 rCRS_simulated_test.fq.gz \
\
-o test.txt \
--hc-files hcfiles \
-pf test.posterior.txt
cat <<-END_VERSIONS > versions.yml
"test_vgan_haplocart_single_end:VGAN_HAPLOCART":
vgan: $(vgan version 2>&1 | sed -e "s/vgan version //g;s/ (Mela)//g")
END_VERSIONS
Do you have any suggestions on how to fix it?
Hi @aidaanva,
(Sorry for the late reply, I was on vacation.)
Thank you for working on this! I really appreciate the help. It looks like the reads are not being mapped to the reference graph. Are they reads from the mitochondria?
Hi @JoshuaDanielRubin,
I tried with some libraries that have some reads to the mitochondria, however not too many.
I also tried with the dataset that you included, "rCRS_simulated_test.fq.gz", but this also failed. Are there reads to the mitochondria in that file?
I can check for other files to do the testing that we have in house, how many reads should I be aiming to have in the mitochondria?
Thank you for your time!
@aidaanva
Yes, the rCRS is mitochondrial. So this is strange that not a single read is mapping.
May I ask what the contents of the stacktrace are in the file
/tmp/vg_crash_NU38XH/stacktrace.txt
Also, just to be sure, the --hcfiles argument is pointing to the directory with the haplocart files?
@JoshuaDanielRubin the --hcfiles points to a directory that contains:
ls hcfiles
children.txt graph.gbwt graph.giraffe.gbz graph.snarls graph_paths k31_w11.min parents.txt path_supports
graph.dist graph.gg graph.og graph.xg k17_w18.min mappability.tsv parsed_pangenome_mapping
I downloaded them based on the documentation in vgan.
I can't find the file /tmp/vg_crash_NU38XH/stacktrace.txt in my tmp directory. Do you know how to change the path to where the program stores that file? then I could send you the output.
@aidaanva
ok, this is strange :) I am thinking that the stacktrace will not be very informative in any case. To isolate the issue I can think of two things
- Try an earlier conda version of Haplocart
- Try a different input file with human mtDNA reads
I've migrated to nf-test and upgraded to v3.0.0 of the tool. This seems to not have the hcfiles input, so just testing the fastq files directly. Worked on conda in gitpod, but crashed in docker.
@JoshuaDanielRubin , @aidaanva , any progress on resolving this?