PRS-Tutorial icon indicating copy to clipboard operation
PRS-Tutorial copied to clipboard

Look into PRS components of each individual sample in PLINK --score?

Open fzel1994 opened this issue 2 years ago • 7 comments

Is there any possible way to check which SNPs is included in counting the score of the scoring profile? any flag to put in the PLINK command line?

fzel1994 avatar Mar 28 '23 12:03 fzel1994

have you found any way to check which SNps is included in the scoring profile? I used the tutorial to calculate the PRS, besides, I also select the SNPs first, and then use --score directly, I found the results are different, I would like to know the reason for this.

yu-zhang-oYo avatar Jan 25 '25 18:01 yu-zhang-oYo

Use —print-snp

Sam

On Sat, Jan 25, 2025 at 1:21 PM Yu_Zhang @.***> wrote:

have you found any way to check which SNps is included in the scoring profile? I used the tutorial to calculate the PRS, besides, I also select the SNPs first, and then use --score directly, I found the results are different, I would like to know the reason for this.

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRS-Tutorial/issues/48#issuecomment-2614055106, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYVLOUT2YNJDT72RVJL2MPI2VAVCNFSM6AAAAABV3SSIF6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJUGA2TKMJQGY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

choishingwan avatar Jan 25 '25 21:01 choishingwan

Use —print-snp

Sam

Thanks Sam, do you mean use --print-snp in PLINK, there is no such flag in PLINK as followed. Thanks

 plink --bfile EUR.final_snps_for_score \
> --score base.QC.tsv 3 4 8 header sum \
> --print-snp
PLINK v1.90b7.2 64-bit (11 Dec 2023)           www.cog-genomics.org/plink/1.9/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink.log.
Options in effect:
  --bfile EUR.final_snps_for_score
  --print-snp
  --score base.QC.tsv 3 4 8 header sum

Error: Unrecognized flag ('--print-snp').
For more information, try "plink --help <flag name>" or "plink --help | more".

yu-zhang-oYo avatar Jan 26 '25 17:01 yu-zhang-oYo

Oh, that’s for PRSice. Plink uses all SNPs, if it doesn’t, it should output them in the log

Sam

On Sun, Jan 26, 2025 at 12:09 PM Yu_Zhang @.***> wrote:

Use —print-snp

Sam … <#m_-4526959832429802359_>

Thanks Sam, do you mean use --print-snp in PLINK, there is no such flag in PLINK

` plink --bfile EUR.final_snps_for_score \

--score base.QC.tsv 3 4 8 header sum --print-snp PLINK v1.90b7.2 64-bit (11 Dec 2023) www.cog-genomics.org/plink/1.9/ (C) 2005-2023 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to plink.log. Options in effect: --bfile EUR.final_snps_for_score --print-snp --score base.QC.tsv 3 4 8 header sum

Error: Unrecognized flag ('--print-snp'). For more information, try "plink --help " or "plink --help | more". `

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRS-Tutorial/issues/48#issuecomment-2614508857, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYTUQH3U2KLNFP45KB32MUJCXAVCNFSM6AAAAABV3SSIF6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJUGUYDQOBVG4 . You are receiving this because you commented.Message ID: @.***>

choishingwan avatar Jan 26 '25 17:01 choishingwan

it is weird that I cannot find the SNPs that is selected for the PRS in the code from tutorial (see the code in the first way below). I also tried to extract those SNPs first (as the code in the 2nd way) and then use --score in PLINK. It has different results as the following code. I have no idea the reason for the difference. I would appreciate it so much if you could help me out.

the first way is using the similar code as in the tutorial

yzh10@login2:/N/slate/yzh10/numom2b/GDM> gunzip -c base.QC.gz | awk '{print $3,$8}' > SNP.pvalue
yzh10@login2:/N/slate/yzh10/numom2b/GDM> echo "0.000001 0 0.000001" > range_list
yzh10@login2:/N/slate/yzh10/numom2b/GDM> gunzip -c base.QC.gz > base.QC.tsv
yzh10@login2:/N/slate/yzh10/numom2b/GDM> plink --bfile EUR.QC \
> --score base.QC.tsv 3 4 8 header \
> --write-snplist \
> --q-score-range range_list SNP.pvalue \
> --extract EUR.valid.snp \
> --out EUR.PRS
PLINK v1.90b7.2 64-bit (11 Dec 2023)           www.cog-genomics.org/plink/1.9/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to EUR.PRS.log.
Options in effect:
  --bfile EUR.QC
  --extract EUR.valid.snp
  --out EUR.PRS
  --q-score-range range_list SNP.pvalue
  --score base.QC.tsv 3 4 8 header
  --write-snplist

257366 MB RAM detected; reserving 128683 MB for main workspace.
5419600 variants loaded from .bim file.
5830 people (0 males, 5830 females) loaded from .fam.
--extract: 191265 variants remaining.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 5830 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is in [0.9999995, 1).
191265 variants and 5830 people pass filters and QC.
Note: No phenotypes present.
List of variant IDs written to EUR.PRS.snplist .
Warning: 7241950 lines skipped in --score file (7241950 due to variant ID
mismatch, 0 due to allele code mismatch); see EUR.PRS.nopred for details.
--score: 191265 valid predictors loaded.
Warning: 7241951 lines skipped in --q-score-range data file.
--score: 1 range processed.
Results written to EUR.PRS.*.profile. 

the results is like this

yzh10@login2:/N/slate/yzh10/numom2b/GDM> head EUR.PRS.0.000001.profile 
                  FID                   IID  PHENO    CNT   CNT2    SCORE
  202946750069_R03C01   202946750069_R03C01     -9      2      1 1.485e-07
  202947050020_R05C01   202947050020_R05C01     -9      2      1 1.485e-07
  202947050133_R04C01   202947050133_R04C01     -9      2      1 1.485e-07
  203010920076_R01C01   203010920076_R01C01     -9      2      0        0
  203010920076_R02C01   203010920076_R02C01     -9      2      0        0
  203010920076_R04C01   203010920076_R04C01     -9      2      1 1.485e-07
  203010920076_R06C01   203010920076_R06C01     -9      2      1 1.485e-07
  203010920076_R07C01   203010920076_R07C01     -9      2      2 2.97e-07
  203010920090_R01C01   203010920090_R01C01     -9      2      1 1.485e-07 

another one is to select SNPs first and then use PLINK --score

yzh10@login2:/N/slate/yzh10/numom2b/GDM> gunzip -c base.QC.gz | awk '$7 < 1e-6 {print $3}' > significant_snps.list
yzh10@login2:/N/slate/yzh10/numom2b/GDM> comm -12 <(sort significant_snps.list) <(sort EUR.valid.snp) > final_snps_for_score.list
yzh10@login2:/N/slate/yzh10/numom2b/GDM> plink --bfile EUR.QC \
> --score base.QC.tsv 3 4 8 header \
> --extract final_snps_for_score.list \
> --out EUR.PRS
PLINK v1.90b7.2 64-bit (11 Dec 2023)           www.cog-genomics.org/plink/1.9/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to EUR.PRS.log.
Options in effect:
  --bfile EUR.QC
  --extract final_snps_for_score.list
  --out EUR.PRS
  --score base.QC.tsv 3 4 8 header

257366 MB RAM detected; reserving 128683 MB for main workspace.
5419600 variants loaded from .bim file.
5830 people (0 males, 5830 females) loaded from .fam.
--extract: 47 variants remaining.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 5830 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is in [0.9999995, 1).
47 variants and 5830 people pass filters and QC.
Note: No phenotypes present.
Warning: 7433168 lines skipped in --score file (7433168 due to variant ID
mismatch, 0 due to allele code mismatch); see EUR.PRS.nopred for details.
--score: 47 valid predictors loaded.
--score: Results written to EUR.PRS.profile .
yzh10@login2:/N/slate/yzh10/numom2b/GDM> head  EUR.PRS.profile 
                  FID                   IID  PHENO    CNT   CNT2    SCORE
  202946750069_R03C01   202946750069_R03C01     -9     94     28 0.0111638
  202947050020_R05C01   202947050020_R05C01     -9     94     19 -0.000518085
  202947050133_R04C01   202947050133_R04C01     -9     94     22 -0.00160957
  203010920076_R01C01   203010920076_R01C01     -9     94     23 -1.80851e-05
  203010920076_R02C01   203010920076_R02C01     -9     94     17  -0.0035
  203010920076_R04C01   203010920076_R04C01     -9     94     26 0.0123585
  203010920076_R06C01   203010920076_R06C01     -9     94     26 0.00277128
  203010920076_R07C01   203010920076_R07C01     -9     94     23  0.01115
  203010920090_R01C01   203010920090_R01C01     -9     94     24 0.00867234

We can see that the results are different in two ways, but they are supposed to be the same.

yu-zhang-oYo avatar Jan 26 '25 17:01 yu-zhang-oYo

Could you please provide information as to how the score is different? And what’s the plink log?

Sam

On Sun, Jan 26, 2025 at 12:33 PM Yu_Zhang @.***> wrote:

it is weird that I cannot find the SNPs that is selected for the PRS in the code from tutorial (see the code below). I also tried to extract those SNPs below the P-values threshold first and then use --score in PLINK. It has different results as the following code. I have no idea the reason for the difference. I would appreciate it so much if you could help me out.

plink
--bfile EUR.QC
--score Height.QC.Transformed 3 4 12 header
--q-score-range range_list SNP.pvalue
--extract EUR.valid.snp
--out EUR

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRS-Tutorial/issues/48#issuecomment-2614518365, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYSNHY55HL66GZLJQTD2MUL5NAVCNFSM6AAAAABV3SSIF6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJUGUYTQMZWGU . You are receiving this because you commented.Message ID: @.***>

choishingwan avatar Jan 27 '25 15:01 choishingwan

Could you please provide information as to how the score is different? And what’s the plink log?

Sam

I have found the bug, thanks for your information.

yu-zhang-oYo avatar Jan 27 '25 18:01 yu-zhang-oYo