vgraph icon indicating copy to clipboard operation
vgraph copied to clipboard

help interpreting output

Open RichardCorbett opened this issue 8 years ago • 7 comments

Hi Kevin, I was looking at some GIAB data this morning and found the link to your tool. I gave it a whirl with this command:

vgraph repmatch --include-regions GIAB/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_nosomaticdel.bed --reference /home/pubseq/genomes/Homo_sapiens/GRCh37/1000genomes/bwa_ind/genome/GRCh37-lite.fa GIAB/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.vcf.gz gsc/GSC.vcf.gz > out.txt

in which the output file contained these match lines:

107     MATCH== TYPE=H
 2     MATCH=. TYPE=N

3429981 MATCH== TYPE=T 176428 MATCH=X TYPE=H

I think I can guess what the bottom two lines represent, but I was wondering if you could explain all 4 lines? If there is a better way to quantify a match I'd be happy to know that as well.

thanks, Richard

RichardCorbett avatar Jun 30 '17 20:06 RichardCorbett

Hi Richard,

Thanks for asking.

  • Type=T represents a trivial match, where the two superloci are identical in terms of genomic coordinates, alleles and genotypes. i.e. no need to invoke the full power of the haplotype matcher.
  • Type=H is where the haplotype matcher is needed.
  • Match="=" are superloci that match
  • Match="X" are superloci that don't match.
  • Match="N" are nocalls, typically due to out of spec VCF records that overlap, as are occasionally generated by GATK.

bioinformed avatar Jun 30 '17 20:06 bioinformed

Perfect. Many thanks.

RichardCorbett avatar Jun 30 '17 20:06 RichardCorbett

One more question - How would you recommend counting the variants uniquely called in my set or in the GIAB set?

RichardCorbett avatar Jun 30 '17 21:06 RichardCorbett

I have been working on a wrapper around vgraph that does much more detailed accounting. I'll see if I can share it, as it was developed as part of my day job.

bioinformed avatar Jun 30 '17 21:06 bioinformed

Thanks. Any word on permission to share your code?

RichardCorbett avatar Jul 05 '17 21:07 RichardCorbett

I've asked and am waiting for an answer. I expect to hear back by the end of next week.

bioinformed avatar Jul 05 '17 21:07 bioinformed

Many thanks. I'm not up against a deadline or anything I just wanted to try it out.

RichardCorbett avatar Jul 05 '17 21:07 RichardCorbett