vg icon indicating copy to clipboard operation
vg copied to clipboard

how to set the backbone reference genome when doing SV calling and genotyping

Open biozzq opened this issue 4 years ago • 1 comments

Dear all,

When learning how to detect SVs based on the pggb constructed graph genome using vg giraffe, I found that the results are a bit strange. And this may be due to the absense of the backbone reference genome when running vg. I wonder that if we can detect SVs on the specified paths. Also, if we could detect SVs on the specified paths, such as the backbone full genome, I think the SV type could be also annotated in the final VCF file. Thank you in advance.

vg autoindex -R XG -g pggb.prune.gfa -w giraffe -t 4 -T ./ -p pggb.prune``
vg giraffe -Z pggb.prune.giraffe.gbz -m pggb.prune.min -d prune.dist -t 4 -f R1.fq.gz -f R2.fq.gz > map.gam
vg pack -x pggb.prune.xg -g map.gam -Q 10 -s 5 -o map.pack -t 4
vg call pggb.prune.xg -k map.pack -s demo -t 4 > demo.graph.vcf

Sincerely, Zheng zhuqing

biozzq avatar Apr 05 '22 07:04 biozzq

vg call can use any paths in the graph as references via the -p option. But cycles in the reference path (which PGGB can produce) will be collapsed in the VCF as well, which makes them hard to interpret in vg call's output (ie wrong).

For example, if there are two variants on path chr1

chr1 10 A T
chr1 20 A G

but positions 10 and 20 are on the same node in the graph (ie a cycle), then vg call will just collapse them into the first position found

chr1 10 A T,G

I guess in theory if you're calling with a GBWT call -g it would be possible to port over the unfolding logic from deconstruct to resolve cycles -- but we haven't done that yet.

One thing you might have more luck with now is odgi untangle

glennhickey avatar Apr 05 '22 12:04 glennhickey