`vg giraffe` Split alignments cannot be converted to named-segment-space GAF
1. What were you trying to do?
map reads to the graph with vg giraffe.
2. What did you want to happen? get the GAF with original GFA node IDs.
3. What actually happened? Segmentation fault (core dumped)
4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:
Preparing Indexes
Loading Minimizer Index
Loading GBZ
Loading Distance Index v2
Paging in Distance Index v2
Initializing MinimizerMapper
Loading and initialization: 277.516 seconds
Of which Distance Index v2 paging: 16.0807 seconds
Mapping reads to "-" (GAF)
--watchdog-timeout 10
--match 1
--mismatch 4
--gap-open 6
--gap-extend 1
--full-l-bonus 5
--max-multimaps 1
--hit-cap 10
--hard-hit-cap 500
--score-fraction 0.9
--max-min 500
--num-bp-per-min 1000
--distance-limit 200
--max-extensions 800
--max-alignments 8
--cluster-score 50
--pad-cluster-score 20
--cluster-coverage 0.3
--extension-score 1
--extension-set 20
--rescue-attempts 15
--max-fragment-length 2000
--paired-distance-limit 2
--rescue-subgraph-size 4
--rescue-seed-limit 100
--chaining-cluster-distance 100
--precluster-connection-coverage-threshold 0.3
--min-precluster-connections 10
--max-precluster-connections 50
--max-lookback-bases 100
--min-lookback-items 1
--lookback-item-hard-cap 15
--chain-score-threshold 100
--min-chains 1
--chain-min-score 100
--max-chain-connection 100
--max-tail-length 100
--max-dp-cells 16777216
--interleaved
--rescue-algorithm dozeu
Not counting CPU instructions because perf events are unavailable: No such file or directory
Using fragment length estimate: 267.913 +/- 27.5311
warning[vg::giraffe]: Refusing to perform too-large rescue alignment of 150 bp against 23722 bp dagified subgraph for read S0R178728/2 which would use more than 1572864 cells and might exhaust Dozeu's allocator; suppressing further warnings.
Unhandled exception: Split alignments cannot be converted to named-segment-space GAF
Exception context: S0R3928687/1, S0R3928687/2
━━━━━━━━━━━━━━━━━━━━
Crash report for vg v1.52.0 "Bozen"
Stack trace (most recent call last) in thread 86357:
#11 Object "", at 0xffffffffffffffff, in
#10 Object "/home/wenhai/application/vg/vg-1.52/vg", at 0x214dd33, in __clone
#9 Object "/home/wenhai/application/vg/vg-1.52/vg", at 0x20a73da, in start_thread
#8 Object "/home/wenhai/application/vg/vg-1.52/vg", at 0x2049cbd, in gomp_thread_start
#7 Object "/home/wenhai/application/vg/vg-1.52/vg", at 0x204c607, in gomp_team_barrier_wait_end
#6 Object "/home/wenhai/application/vg/vg-1.52/vg", at 0x2043f0a, in gomp_barrier_handle_tasks
#5 Object "/home/wenhai/application/vg/vg-1.52/vg", at 0xeb708a, in unsigned long vg::io::paired_for_each_parallel_after_wait<vg::Alignment>(std::function<bool (vg::Alignment&, vg::Alignment&)>, std::function<void (vg::Alignment&, vg::Alignment&)>, std::function<bool ()>, unsigned long) [clone ._omp_fn.1]
#4 Object "/home/wenhai/application/vg/vg-1.52/vg", at 0x518d4c, in std::_Function_handler<void (vg::Alignment&, vg::Alignment&), main_giraffe(int, char**)::{lambda()#1}::operator()() const::{lambda(vg::Alignment&, vg::Alignment&)#6}>::_M_invoke(std::_Any_data const&, vg::Alignment&, vg::Alignment&) [clone .cold]
#3 Object "/home/wenhai/application/vg/vg-1.52/vg", at 0x53f3fb, in vg::report_exception(std::exception const&) [clone .cold]
#2 Object "/home/wenhai/application/vg/vg-1.52/vg", at 0x5e590b, in abort
#1 Object "/home/wenhai/application/vg/vg-1.52/vg", at 0x207c1f5, in raise
#0 Object "/home/wenhai/application/vg/vg-1.52/vg", at 0x20a8bfc, in __pthread_kill
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Please include this entire error log in your bug report!
━━━━━━━━━━━━━━━━━━━━
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━Crash report for vg ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Crash report for vg ━━━━━━━━━━━━━━v1.52.0 "Bozen"
━━━━━━━━━━━━━━━━━━━
Crash report for vg v1.52.0 "Bozen"
━━━━
Crash report for vg ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Crash report for vg v1.52.0 "Bozen"
━━━━━━━━━━━━━━━━━━━━━━━━━━
Stack trace (most recent call last)Crash report for vg v1.52.0 "Bozen"
━━v1.52.0 "Bozen"Stack trace (most recent call last) in thread 86364━━━━━━
━━━━━━━━━━━━━━━━━━━━
Crash report for vg v1.52.0 "Bozen"
Stack trace (most recent call last) in thread 86374:
#13 Object "", at 0xffffffffffffffff, in
Stack trace (most recent call last) in thread 86353:
Segmentation fault (core dumped)
bug.txt This file may provide more detailed information.
5. What data and command can the vg dev team use to make the problem happen?
When I run vg giraffe -Z merge_index.giraffe.gbz -m merge_index.min -d merge_index.dist -i -f ../camisim_simulate/60_genome_simulate_result/2023.11.09_15.18.18_sample_0/reads/anonymous_reads.fq -t 32 --named-coordinates -o gaf -p > gfa_mapped_undel.gaf, I got the error.
I noticed that the problem seems to be happening on the reads Exception context: S0R3928687/1, S0R3928687/2. I deleted them and the command can work. But I don't know why I got the error.
Unhandled exception: Split alignments cannot be converted to named-segment-space GAF
Exception context: S0R3928687/1, S0R3928687/2
The paired reads information:
@S0R3928687/1
TGACTTCGTTCTCTACTATTTCTTTTAGAAGCTCAGATGCTCTGGATTCCTTAATACCAATAACTTCCATCACATCCGAACGCCCAAAAATTGTTTGTCCCGGAAATGCTTCACGGATTCTAAGAATATAACTCGCAGTTTTCGTTTGGA
+
KOORRPRRRM6TSIJUSVRVVUMUVKULVSNVVUUSUQVVRVUUVTVGPHUEQVRPV/PROPPLVVPPLNOPVT3PPRQP/PPPPOPOOGQPPOPPNJFONOPNNQEPPPPPPNM/P/N/EMPPPMJ/OMNPPNFELK/HHO/P/MPPN/
@S0R3928687/2
CGGACAATGCACATCAGCGGAAAATTCAAGGAGCCTGAAAAACCGGACATTGGAGTTGAAAAACCGGACATTGGAGCTGAAAAACCGGACATTGGAGCTGAAAAACCGGACATTGGAGCTGAAAAACCGGACATTGAGAAGAAGTTCCAA
+
OOORRPRRPRTTPVMUVUVFVVVVVVTVUVVTUSVVUUUVVTSLPSSP/NPUNAT/V/RPVPTPSPNMNT/NPPPPTT<TKOPOPOONOAPNPPPNLAPP/PPO<KJ/HM/NPPPLELPONOP//PLLP/PPPPKP/OPHP/PPPO//QO
If necessary, I can provide a small graph file and a small sample including the paired reads data, which can reproduce this error.
6. What does running vg version say?
vg version v1.52.0 "Bozen"
Compiled with g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 on Linux
Linked against libstd++ 20230528
Built by jeizenga@emerald
Thank you in advance!
The error message indicates that the alignment is invalid. At some point in the middle of the alignment, the next mapping (~edit) starts in the middle of a node at an unexpected offset. Either Giraffe is producing an invalid alignment or the GAF emitter is interpreting it wrong.
It would be helpful if you can provide the data to reproduce the error.
I upload a .vg file(~460M) and a sample reads data(~61M) here.
https://drive.google.com/drive/folders/1vY4N9O1XLXDB1ukAljy4S_i21vea5-dp?usp=sharing
The command I used
vg convert -fW test_merge.vg > test_merge.gfa
vg autoindex -p test_merge_index -w giraffe -g test_merge.gfa -t 32
vg giraffe -Z test_merge_index.giraffe.gbz -m test_merge_index.min -d test_merge_index.dist -i -f sub_reads.fq -t 32 --named-coordinates -o gaf > gfa_mapped.gaf
I managed to replicate your issue. However, if you remove option -W from the vg convert command, the haplotypes will be written as W-lines in the GFA file and vg autoindex will understand them correctly. Giraffe will then map the reads to the graph without any issues.
vg autoindex uses a simple heuristic with GFA files. If there are any W-lines in the file, the GFA is assumed to contain haplotypes, and the GBZ graph can be built directly from the GFA. Otherwise vg autoindex assumes that there are no haplotypes at all and proceeds to generate fully synthetic haplotypes with arbitrary combinations of variants. The resulting GBZ graph will be much worse than the one built directly from the GFA file.
I will continue investigating for the reason of the crash.
@adamnovak Can you help me with this one?
The problem can be replicated in the graph available in the linked Google Drive with the given commands. It's enough to map the following reads with --fragment-mean 268.885 --fragment-stdev 25.9609:
@S0R3928687/1
TGACTTCGTTCTCTACTATTTCTTTTAGAAGCTCAGATGCTCTGGATTCCTTAATACCAATAACTTCCATCACATCCGAACGCCCAAAAATTGTTTGTCCCGGAAATGCTTCACGGATTCTAAGAATATAACTCGCAGTTTTCGTTTGGA
+
KOORRPRRRM6TSIJUSVRVVUMUVKULVSNVVUUSUQVVRVUUVTVGPHUEQVRPV/PROPPLVVPPLNOPVT3PPRQP/PPPPOPOOGQPPOPPNJFONOPNNQEPPPPPPNM/P/N/EMPPPMJ/OMNPPNFELK/HHO/P/MPPN/
@S0R3928687/2
CGGACAATGCACATCAGCGGAAAATTCAAGGAGCCTGAAAAACCGGACATTGGAGTTGAAAAACCGGACATTGGAGCTGAAAAACCGGACATTGGAGCTGAAAAACCGGACATTGGAGCTGAAAAACCGGACATTGAGAAGAAGTTCCAA
+
OOORRPRRPRTTPVMUVUVFVVVVVVTVUVVTUSVVUUUVVTSLPSSP/NPUNAT/V/RPVPTPSPNMNT/NPPPPTT<TKOPOPOONOAPNPPPNLAPP/PPO<KJ/HM/NPPPLELPONOP//PLLP/PPPPKP/OPHP/PPPO//QO
The alignment for the second read is:
[
{"edit": [{"sequence": "TTGGAACTTCTTCT", "to_length": 14}, {"from_length": 21, "to_length": 21}], "position": {"is_reverse": true, "node_id": "1845548", "offset": "21"}, "rank": "1"},
{"edit": [{"from_length": 21, "to_length": 21}], "position": {"is_reverse": true, "node_id": "1845548"}, "rank": "2"},
{"edit": [{"from_length": 18, "to_length": 18}], "position": {"is_reverse": true, "node_id": "1845546"}, "rank": "3"},
{"edit": [{"from_length": 3, "to_length": 3}], "position": {"is_reverse": true, "node_id": "1845547"}, "rank": "4"},
{"edit": [{"from_length": 17, "to_length": 17}, {"from_length": 1, "sequence": "A", "to_length": 1}], "position": {"is_reverse": true, "node_id": "1845546"}, "rank": "5"},
{"edit": [{"from_length": 3, "to_length": 3}], "position": {"is_reverse": true, "node_id": "1845547"}, "rank": "6"},
{"edit": [{"from_length": 18, "to_length": 18}], "position": {"is_reverse": true, "node_id": "1845546"}, "rank": "7"},
{"edit": [{"from_length": 26, "to_length": 26}], "position": {"is_reverse": true, "node_id": "1845500"}, "rank": "8"},
{"edit": [{"from_length": 8, "to_length": 8}], "position": {"is_reverse": true, "node_id": "1845499"}, "rank": "9"}
]
I think the offset should be 0 in the mapping of rank 1, because the length of that node is 21. That probably triggers the exception in alignment_to_gaf(): https://github.com/vgteam/libvgio/blob/45d8ada05ee1d1405ef44d93f2ac00a5a097dd09/src/alignment_io.cpp#L263-L276.
In GAF without named coordinates, the alignment gets reversed to:
S0R3928687/2 150 0 150 + >1845499>1845500>1845546>1845547>1845546>1845547>1845546>1845548>1845548 160 24 139 135 150 60 AS:i:136 bq:Z:OOORRPRRPRTTPVMUVUVFVVVVVVTVUVVTUSVVUUUVVTSLPSSP/NPUNAT/V/RPVPTPSPNMNT/NPPPPTT<TKOPOPOONOAPNPPPNLAPP/PPO<KJ/HM/NPPPLELPONOP//PLLP/PPPPKP/OPHP/PPPO//QO cs:Z::55*CT:59-:21+AGAAGAAGTTCCAA dv:f:0.1 fp:Z:S0R3928687/1 pd:b:1
The relevant subgraph is: nodes.pdf
Thank you. I'm sorry I didn't notice the update on the wiki. It can work without any issues when I used the GFA file with W lines. Now I construct index with vg autoindex for my actual data. I'm not sure how this will affect my data, but I don't think it will be worse.
I have already constructed index for my actual data and vg giraffe performs better.
If you need to continue solving the exception in alignment_to_gaf(), I won't close this issue now.
@jltsiren I'll take a look at this. That subgraph looks terrible, it must be doing something Giraffe didn't expect.