deepvariant icon indicating copy to clipboard operation
deepvariant copied to clipboard

issue with retrained model

Open sophienguyen01 opened this issue 1 year ago • 1 comments

Hi,

Here is a set of additional parameter I used in 'make_examples' step to create examples for retraining DeepVariant: --min_base_quality 5
--min_mapping_quality 1
--vsc_min_fraction_snps 0.02
--p_error 0.1 \

After retraining DeepVariant and get the model with best score, I compared the vcf outputs using default DeepVariant and retrained DeepVariant model on HG003 sample at chromosome20. I notice that there are many variants that two models classify differently Here is an example of the difference:

default model: chr20 61083 . C T 33.3 PASS . GT:GQ:DP:AD:VAF:PL 0/1:33:40:19,21:0.525:33,0,42 chr20 11479054 . A G 56.6 PASS . GT:GQ:DP:AD:VAF:PL 1/1:53:29:0,29:1:56,55,0 chr20 29356747 . A G 43.4 PASS . GT:GQ:DP:AD:VAF:PL 1/1:29:26:0,26:1:43,29,0 chr20 54889360 . G A 56.8 PASS . GT:GQ:DP:AD:VAF:PL 1/1:54:39:0,39:1:56,57,0

trained model: chr20 61083 . C T 24.9 PASS . GT:GQ:DP:AD:VAF:PL 0/1:18:41:19,21:0.512195:24,0,18 chr20 11479054 . A G 0 RefCall . GT:GQ:DP:AD:VAF:PL 0/0:36:29:0,29:1:0,47,36 chr20 29356747 . A G 0 RefCall . GT:GQ:DP:AD:VAF:PL 0/0:33:28:0,28:1:0,47,33 chr20 54889360 . G A 0.5 RefCall . GT:GQ:DP:AD:VAF:PL ./.:10:39:0,39:1:0,29,9

Is there an output from DeepVariant (e.g intermediate files) that help me to understand how DV make decision on classifying the variant? My goal is to train DeepVariant so that it can keep those 'PASS' variants as in the default model and detect more variants in the dataset. Please advise how I can do that.

Thank you

sophienguyen01 avatar Oct 17 '24 17:10 sophienguyen01

Hi,

You can provide a --intermediate_results_dir flag to run_deepvariant, which will save all the intermediate outputs (from make examples and call_variants).

If you are interested at looking at the PIL of specific examples, you can use the show_examples tool.

If you are interested in increasing recall, you can adjust the various vsc_* parameters to make_examples, which will either increase or decrease the number of candidates generated.

Hopefully that helps!

lucasbrambrink avatar Oct 18 '24 17:10 lucasbrambrink

Closing this because of no activity, please feel free to reopen if you have further questions.

kishwarshafin avatar Oct 24 '24 19:10 kishwarshafin

Hi, I did look at some images at some locus using show_examples tool. Interestingly, the image of the locus is the same between default and retrained model but the variant interpretation is different. I did also use vsc_* parameter in make_examples when I trained my model. Here I attach an image from retrained examples at locus 11479054. From the image, I expect the variant to be heterozygous from retrained model classifies it as homozygous reference. I also extract the line of this locus from vcf here.

I appreciate if you guys have any ideas why the retrained model misinterpret variant like this. Thank you

Image

default: chr20 11479054 . A G 56.6 PASS . GT:GQ:DP:AD:VAF:PL 1/1:53:29:0,29:1:56,55,0 retrain: chr20 11479054 . A G 0 RefCall . GT:GQ:DP:AD:VAF:PL 0/0:36:29:0,29:1:0,47,36

sophienguyen01 avatar Oct 25 '24 19:10 sophienguyen01