Request for data and scripts for structural variant genotyping experiment
Dear VG team,
I recently read your paper titled "Genotyping structural variants in pangenome graphs using the vg toolkit" , and I am very interested in reproducing the structural variant genotyping experiments you described.
I was wondering if the data and scripts used for that part of the study are publicly available. I am especially interested in the workflow involving vg map, as I would like to test this tool on similar structural variant data.
If possible, could you kindly share:
The input files (reference/VCF/reads) or links to download them The specific vg commands and parameters used Any additional scripts or pipelines you used Thank you very much for your time and for maintaining such a powerful toolkit.
Best regards
Hi, I found the data in [sv-genotyping](https://github.com/vgteam/sv-genotyping-paper/tree/master). However, I can't find the specific instructions(parameters) that used for graph construction and map. I find there are all and non-repeat experiments in your essay. But I don't find how to create non-repeat sample. So can you give me some help about what I ask?
Thank you very much for your time and for maintaining such a powerful toolkit.
That directory seems to have scripts. With READMEs too. For example, in the human/toil-scripts there is map.sh. What specifically else are you looking for?
Also, I'm not sure what "essay" is meant here, or what "non-repeat sample" means. Or what "I find there are all and non-repeat experiments" means in general - perhaps English difficulties? Maybe you could explain in more detail?
@faithokamoto Sorry. What I am looking for is the reference genome and mapping result. The essay(maybe paper is correct description) is "Genotyping structural variants in pangenome graphs using the vg toolkit". The non-repeat sample means in the simulation experiments of essay, the result contains two types: one is all region, another is non-repeat. So I wonder how you get the non-repeat region and is there original data?
The result of experiment in essay:
I think this section from the Methods explains what "non-repeat" means in this context:
We also explored the performance of vg and SMRT-SV v2 Genotyper in different sets of regions (Additional file 1: Figure S12 and Additional file 1: Table S5):
- Non-repeat regions, i.e., excluding segmental duplications and tandem repeats (using the respective tracks from the UCSC Genome Browser).
- Repeat regions defined as segmental duplications and tandem repeats.
Each column of that plot represents a different experiment, so the scripts won't all be in the same place. The scripts are in the repo, however, e.g. first two columns, GIAB. (Note I didn't work on this paper, so this is just me poking around a repository and reading the READMEs.)
As for English: yeah, definitely use "paper" for scientific papers. "Essay" suggests an argumentative paper, e.g. as a student would write for a class assignment. "Paper" is used for scientific manuscripts. In addition, put a space before open-parentheses. A lightly edited version of what I had difficulty understanding the first time:
However, I can't find the specific instructions (parameters) that were used for graph construction and mapping. I saw there were "all" and "non-repeat" experiments in your paper. But I didn't find how to create the non-repeat sample.
(Hopefully the English help seems helpful - I don't want to sound patronizing, it's just that I enjoy copy-editing.)
whoops somehow closed this while I was trying to type my previous comment
Thanks for your time and assistance!