ExPecto icon indicating copy to clipboard operation
ExPecto copied to clipboard

About Constraint violation score

Open kmtiny opened this issue 7 years ago • 10 comments

Hi, I ran into some problems when predicted mutation effect using ExPecto, which are listed as follows:

  1. in my input file *.vcf, whether I should restrict mutations within transcriptional regulatory regions near TSS, or I can use all calling mutations to predict mutations effect.
  2. for calculating 'constraint violation score', I learned that it was computed as the product of 'predicted mutation effect' and 'variation potential directionality score'. For 'predicted mutation effect' of each mutation in different tissues, we can directly obtain from ExPecto. However, for later, which was computed as the sum of predicted log(fold change) values for all mutations per gene in the paper, I should obtain it by calculating the sum(predicted mutation effect) of all mutations on target gene in my *vcf file or using the associated value in the file 'variation_potential.directionality_scores.txt' which was provided in Supplementary_Data.2 of the Paper.
  3. in the paper, there is an explanatory definition on 'constraint violation score', 'The constraint violation score was computed as the product of the predicted variant effect of the prioritized LD variant and the variation potential directionality score of the nearest TSS', 'the variation potential directionality score of the nearest TSS' of which how I should understand? I hope to get your helps, thank you!

kmtiny avatar Aug 17 '18 02:08 kmtiny

Hi,

Hope this helps:

  1. You can use all mutations but for computational efficiency, but I recommend focusing on variants within 10kb or 20kb to TSS. Mutations that are further away usually get very small predicted effects.

  2. 'variation potential directionality score' can be obtained from ''variation_potential.directionality_scores.txt' '. It was calculated based on all potential single nucleotide mutations within 1kb to the TSS.

  3. Constraint violation score is computed as the product of predicted expression effect (log fold change) and variational potential directionality score. Both scores should be computed with respect to the same gene(TSS) - the later is already computed and can be obtained as in 2. In the case of the examples we showed in the paper, we use the nearest TSS as the TSS of interest.

Best, Jian

jzthree avatar Aug 17 '18 02:08 jzthree

Thanks for your timely reply.

kmtiny avatar Aug 17 '18 08:08 kmtiny

I have still a question to ask you! we knows that constraint violation score for each of mutation on a gene can be calculated according to formula in Paper. Then, could we directly sum the scores of all mutation on a gene to represent the impact of all mutation on gene? If couldn't, what the sum might mean? Thank you.

kmtiny avatar Aug 20 '18 03:08 kmtiny

I think you are asking about the variation potential directionality score which is the sum of predicted mutation effects of all potential mutations - right? The sum is used to measure the bias of the distribution of predicted mutation effects - whether the distribution is biased toward positive effect mutations or negative effect mutations. Maybe it is more intuitive to think about the mean of predicted mutation effects, which differs from the sum only by a constant factor in this case.

jzthree avatar Aug 20 '18 04:08 jzthree

Hi, Jian Thanks for your timely reply. In my question, it is indeed on "constraint violation score", the sum of which was mentioned in Paper. I just want to know that supposing we forced to calculate the sum of "constraint violation score" for all mutations on a gene, whether the value of sum would be meaning. In short, for a gene, whether can we calculate the sum of all mutations on it? Thank you!

kmtiny avatar Aug 20 '18 05:08 kmtiny

in sentence " the sum of which was mentioned in Paper", "was mentioned" is corrected to "was not mentioned".

kmtiny avatar Aug 20 '18 05:08 kmtiny

I see that is an interesting question. That will be equivalent to the square of the variation potential directionality score - it can probably have an interpretation as the size of variation potential directionality.

jzthree avatar Aug 20 '18 16:08 jzthree

Hi, Jian A error was occuring when I ran ExPecto with command line "python chromatin.py xx.vcf", which is appended as follows:

Number of variants with reference allele matched with reference genome: 704 Number of input variants: 704 Traceback (most recent call last): File "/work1/xuelab/project/guokm/software/ExPecto/chromatin.py", line 154, in input = torch.from_numpy(ref_encoded[int(i*batchSize):int((i+1)*batchSize),:,:]).unsqueeze(3) RuntimeError: dimension out of range (expected to be in range of [-1, 0], but got 3)

How should I solve it? Thank you!

kmtiny avatar Aug 21 '18 07:08 kmtiny

Did you try git pull the newest code? I just made a commit to fix a bug that may cause this.

jzthree avatar Aug 21 '18 14:08 jzthree

The error reported above had been solved after updating code, Thank you!

kmtiny avatar Aug 24 '18 01:08 kmtiny