vg Request for data and scripts for structural variant genotyping experiment

Dear VG team,

I recently read your paper titled "Genotyping structural variants in pangenome graphs using the vg toolkit" , and I am very interested in reproducing the structural variant genotyping experiments you described.

I was wondering if the data and scripts used for that part of the study are publicly available. I am especially interested in the workflow involving vg map, as I would like to test this tool on similar structural variant data.

If possible, could you kindly share:

The input files (reference/VCF/reads) or links to download them The specific vg commands and parameters used Any additional scripts or pipelines you used Thank you very much for your time and for maintaining such a powerful toolkit.

Best regards

May 28 '25 07:05 pioneer-pi

Hi, I found the data in [sv-genotyping](https://github.com/vgteam/sv-genotyping-paper/tree/master). However, I can't find the specific instructions(parameters) that used for graph construction and map. I find there are all and non-repeat experiments in your essay. But I don't find how to create non-repeat sample. So can you give me some help about what I ask?

Thank you very much for your time and for maintaining such a powerful toolkit.

Jun 17 '25 14:06 pioneer-pi

That directory seems to have scripts. With READMEs too. For example, in the human/toil-scripts there is map.sh. What specifically else are you looking for?

Also, I'm not sure what "essay" is meant here, or what "non-repeat sample" means. Or what "I find there are all and non-repeat experiments" means in general - perhaps English difficulties? Maybe you could explain in more detail?

Jun 26 '25 16:06 faithokamoto

@faithokamoto Sorry. What I am looking for is the reference genome and mapping result. The essay(maybe paper is correct description) is "Genotyping structural variants in pangenome graphs using the vg toolkit". The non-repeat sample means in the simulation experiments of essay, the result contains two types: one is all region, another is non-repeat. So I wonder how you get the non-repeat region and is there original data?

The result of experiment in essay:

Jun 27 '25 13:06 pioneer-pi

I think this section from the Methods explains what "non-repeat" means in this context:

We also explored the performance of vg and SMRT-SV v2 Genotyper in different sets of regions (Additional file 1: Figure S12 and Additional file 1: Table S5):

Non-repeat regions, i.e., excluding segmental duplications and tandem repeats (using the respective tracks from the UCSC Genome Browser).

Repeat regions defined as segmental duplications and tandem repeats.

Each column of that plot represents a different experiment, so the scripts won't all be in the same place. The scripts are in the repo, however, e.g. first two columns, GIAB. (Note I didn't work on this paper, so this is just me poking around a repository and reading the READMEs.)

As for English: yeah, definitely use "paper" for scientific papers. "Essay" suggests an argumentative paper, e.g. as a student would write for a class assignment. "Paper" is used for scientific manuscripts. In addition, put a space before open-parentheses. A lightly edited version of what I had difficulty understanding the first time:

However, I can't find the specific instructions (parameters) that were used for graph construction and mapping. I saw there were "all" and "non-repeat" experiments in your paper. But I didn't find how to create the non-repeat sample.

(Hopefully the English help seems helpful - I don't want to sound patronizing, it's just that I enjoy copy-editing.)

Jun 27 '25 16:06 faithokamoto

whoops somehow closed this while I was trying to type my previous comment

Jun 27 '25 16:06 faithokamoto

Thanks for your time and assistance!

Jun 29 '25 09:06 pioneer-pi