vg icon indicating copy to clipboard operation
vg copied to clipboard

How to get all alignment results?

Open SZ-qing opened this issue 2 years ago • 2 comments

Hi, When I prepare to align my short reads to the human pan-genome graph genome, the result is only a path, 1. what I want is to provide me all the sequences that have a mismatch and full alignment with this reads? 2. And how can I know the annotation information of the sequences that are aligned to the reads, such as to exon regions, cds regions or intron regions, from the results?

Shell:
vg giraffe -Z hprc-v1.1-mc-grch38.gbz -p -f ./small_sim.fq -o json--max-multimaps 10 >small_sim_aln_M10.json

Results: image

SZ-qing avatar Nov 20 '23 01:11 SZ-qing

When i add ref-paths, the results is not be changed:
vg giraffe -Z hprc-v1.1-mc-grch38.gbz -p -f ./small_sim.fq -o json --max-multimaps 10 --ref-paths ./all_graph_paths.txt >small_sim_aln_M10_allpaths.json

SZ-qing avatar Nov 20 '23 02:11 SZ-qing

I don't know of any tool that does exactly what you're describing. However, the --ref-paths is only relevant for SAM/BAM output, so it's expected that it would not affect the GAM. If you want to annotate the path position in the GAM, you can use vg annotate -x hprc-v1.1-mc-grch38.gbz -p -a, but this method definitely has some failure cases. Also, it will only work to annotate positions on reference paths, so you will not get positions for other haplotypes.

jeizenga avatar Dec 01 '23 19:12 jeizenga