mtag icon indicating copy to clipboard operation
mtag copied to clipboard

MTAG for a very large number of phenotypes

Open AurinaBMH opened this issue 4 years ago • 6 comments

Hey, it's not an issue, but rather a prospective question, hope someone would be able to answer it. I'm planning to run several thousand individual GWASs on a set of brain imaging measures within the same cohort. At this point, I do not know how large the genetic correlations between the phenotypes are, but they're likely to be non-zero. So here are a couple of questions:

  • Would MTAG be scalable to very large numbers of phenotypes (several thousand)? If I understand correctly, in this case, the result would be a separate MTAG-based GWAS for each trait analyzed, right? According to the original paper, the run time for 3 traits and 6mln SNPs was estimated to be 28min... How does the run time scale with the number of phenotypes analyzed?
  • Would it be possible to meta-analyze several thousand individual GWASs (the original GWAS for each measure) using MTAG to derive a single GWAS summary statistics for a set of measures (instead of a separate GWAS for each individual measure)? The goal of the analysis would then be to then to use that single meta-analyses GWAS result (derived from multiple phenotypes) for generating polygenic risk scores that would be used to test for associations with other phenotypes.

Hope this makes sense and thank you very much for the response.

AurinaBMH avatar Oct 20 '21 02:10 AurinaBMH

Hi Aurina,

That is an interesting question. It's possible that this is a great use case, but a couple things make me nervous about it.

  1. We may have myopically coded up the software such that things start breaking if you have more than 10 files (limited by being able to number the files with single digit numbers). Other users have asked about this, and we tried fixing it, but it turns out to be nontrivial, and we've just not had the bandwidth to take care of it. If you are planning to do a true meta-analysis (with the meta analysis option of the software), you will only get one set of summary statistics corresponding to the average effect across all traits, but it might skip the problematic step in the software. If you want to make a single PGS, maybe this is what you prefer anyways.
  2. MTAG doesn't take noise in the estimate of Omega and Sigma into account when combining sets of summary statistics. If you use the meta-analysis option, the noise in Omega doesn't matter since you would be assuming it is a matrix of ones (times a constant), but it may be that noise in Sigma causes you problems. We did simulations and showed that the type one error rate gets large when more than 20 traits are used. If you are just making a PGS though, maybe it doesn't matter if your SEs are not quite right though. Also if you are looking at several highly (phenotypically) correlated phenotypes in a perfectly overlapping sample, it's possible that the error will be very small in your case. We didn't test that scenario in our simulations.
  3. Re runtime: I actually have no idea. The first step of MTAG (the LDSC step) is quadratic in the number of traits and the second step is linear in the number of SNPs. Generally it was the second step that dominated run-time IIRC. If you have a ton of traits though, you essentially need to run LDSC for each pair of them. I think that that it is fast enough that you could run your own experiments on sets of 20-100 traits and get a good sense of how well it is scaling. I'd be interested in hearing how it goes if you do this!

Let me know if you have any other questions.

Best, Patrick

On Tue, Oct 19, 2021 at 10:55 PM AurinaBMH @.***> wrote:

Hey, it's not an issue, but rather a prospective question, hope someone would be able to answer it. I'm planning to run several thousand individual GWASs on a set of brain imaging measures within the same cohort. At this point, I do not know how large the genetic correlations between the phenotypes are, but they're likely to be non-zero. So here are a couple of questions:

  • Would MTAG be scalable to very large numbers of phenotypes (several thousand)? If I understand correctly, in this case, the result would be a separate MTAG-based GWAS for each trait analyzed, right? According to the original paper, the run time for 3 traits and 6mln SNPs was estimated to be 28min... How does the run time scale with the number of phenotypes analyzed?
  • Would it be possible to meta-analyze several thousand individual GWASs (the original GWAS for each measure) using MTAG to derive a single GWAS summary statistics for a set of measures (instead of a separate GWAS for each individual measure)? The goal of the analysis would then be to then to use that single meta-analyses GWAS result (derived from multiple phenotypes) for generating polygenic risk scores that would be used to test for associations with other phenotypes.

Hope this makes sense and thank you very much for the response.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/144, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5PVNTLVJ3JANTJCA7DUHYVS5ANCNFSM5GKRYJWQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

paturley avatar Oct 21 '21 16:10 paturley

Hi Patrick,

Thank you very much for such a comprehensive response, that is very useful to know. I'm still in the planning stages of the project and I'll keep those things in mind. If the meta-analysis option of the software could handle a large number of phenotypes, that would address a big part of what we're planning to do.

Just one additional question - to go around potential problems in scaling with time in the meta-analysis, would it make sense to do the meta-analysis in several steps: 1) meta-analyze sets of ~100-200 traits and 2) do the second level meta-analysis of the outputs from the 1st step? Would there be any negative implications from meta-analyzing the results of the meta-analysis? Is that a viable option at all?

Thank you very much for all your help!

Best, Aurina

AurinaBMH avatar Oct 21 '21 22:10 AurinaBMH

Hi Aurina,

I can't think of why you couldn't do the meta-analysis in several steps like that. Seems like a reasonable strategy.

Best, Patrick

On Thu, Oct 21, 2021 at 6:33 PM AurinaBMH @.***> wrote:

Hi Patric,

Thank you very much for such a comprehensive response, that is very useful to know. I'm still in the planning stages of the project and I'll keep those things in mind. If the meta-analysis option of the software could handle a large number of phenotypes, that would address a big part of what we're planning to do.

Just one additional question - to go around potential problems in scaling with time in the meta-analysis, would it make sense to do the meta-analysis in several steps: 1) meta-analyze sets of ~100-200 traits and 2) do the second level meta-analysis of the outputs from the 1st step? Would there be any negative implications from meta-analyzing the results of the meta-analysis? Is that a viable option at all?

Thank you very much for all your help!

Best, Aurina

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/144#issuecomment-949051783, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5K7XYAYPRR7APQU5G3UICIMNANCNFSM5GKRYJWQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

paturley avatar Oct 25 '21 15:10 paturley

Hi Patrick,

Thank you very much for your help!

Best, Aurina

AurinaBMH avatar Oct 26 '21 06:10 AurinaBMH

Hi Patrick, has there been any update regarding this? Best, Divya

divbru avatar Mar 09 '23 15:03 divbru

Hi Divya,

No updates here, though we are working on implementing MTAG into the MAMA software, which should be more robust to these sorts of issues, so hopefully there is a solution soon.

Best, Patrick

On Thu, Mar 9, 2023 at 10:07 AM Divya @.***> wrote:

Hi Patrick, has there been any update regarding this? Best, Divya

— Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/144#issuecomment-1462216580, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5LCSJWUSHIN7I2AGK3W3HW2ZANCNFSM5GKRYJWQ . You are receiving this because you commented.Message ID: @.***>

paturley avatar Mar 09 '23 22:03 paturley