About Constraint violation score
Hi, I ran into some problems when predicted mutation effect using ExPecto, which are listed as follows:
- in my input file *.vcf, whether I should restrict mutations within transcriptional regulatory regions near TSS, or I can use all calling mutations to predict mutations effect.
- for calculating 'constraint violation score', I learned that it was computed as the product of 'predicted mutation effect' and 'variation potential directionality score'. For 'predicted mutation effect' of each mutation in different tissues, we can directly obtain from ExPecto. However, for later, which was computed as the sum of predicted log(fold change) values for all mutations per gene in the paper, I should obtain it by calculating the sum(predicted mutation effect) of all mutations on target gene in my *vcf file or using the associated value in the file 'variation_potential.directionality_scores.txt' which was provided in Supplementary_Data.2 of the Paper.
- in the paper, there is an explanatory definition on 'constraint violation score', 'The constraint violation score was computed as the product of the predicted variant effect of the prioritized LD variant and the variation potential directionality score of the nearest TSS', 'the variation potential directionality score of the nearest TSS' of which how I should understand? I hope to get your helps, thank you!
Hi,
Hope this helps:
-
You can use all mutations but for computational efficiency, but I recommend focusing on variants within 10kb or 20kb to TSS. Mutations that are further away usually get very small predicted effects.
-
'variation potential directionality score' can be obtained from ''variation_potential.directionality_scores.txt' '. It was calculated based on all potential single nucleotide mutations within 1kb to the TSS.
-
Constraint violation score is computed as the product of predicted expression effect (log fold change) and variational potential directionality score. Both scores should be computed with respect to the same gene(TSS) - the later is already computed and can be obtained as in 2. In the case of the examples we showed in the paper, we use the nearest TSS as the TSS of interest.
Best, Jian
Thanks for your timely reply.
I have still a question to ask you! we knows that constraint violation score for each of mutation on a gene can be calculated according to formula in Paper. Then, could we directly sum the scores of all mutation on a gene to represent the impact of all mutation on gene? If couldn't, what the sum might mean? Thank you.
I think you are asking about the variation potential directionality score which is the sum of predicted mutation effects of all potential mutations - right? The sum is used to measure the bias of the distribution of predicted mutation effects - whether the distribution is biased toward positive effect mutations or negative effect mutations. Maybe it is more intuitive to think about the mean of predicted mutation effects, which differs from the sum only by a constant factor in this case.
Hi, Jian Thanks for your timely reply. In my question, it is indeed on "constraint violation score", the sum of which was mentioned in Paper. I just want to know that supposing we forced to calculate the sum of "constraint violation score" for all mutations on a gene, whether the value of sum would be meaning. In short, for a gene, whether can we calculate the sum of all mutations on it? Thank you!
in sentence " the sum of which was mentioned in Paper", "was mentioned" is corrected to "was not mentioned".
I see that is an interesting question. That will be equivalent to the square of the variation potential directionality score - it can probably have an interpretation as the size of variation potential directionality.
Hi, Jian A error was occuring when I ran ExPecto with command line "python chromatin.py xx.vcf", which is appended as follows:
Number of variants with reference allele matched with reference genome:
704
Number of input variants:
704
Traceback (most recent call last):
File "/work1/xuelab/project/guokm/software/ExPecto/chromatin.py", line 154, in
How should I solve it? Thank you!
Did you try git pull the newest code? I just made a commit to fix a bug that may cause this.
The error reported above had been solved after updating code, Thank you!