pheweb icon indicating copy to clipboard operation
pheweb copied to clipboard

Support input in gwas-vcf format

Open jielab opened this issue 4 years ago • 4 comments

Hi, Guys:

these days, GWAS files have up to 20 million rows, really very inefficient to query and process, if stored simply as a TXT file.

I think the VCF format is a really good idea, as explained here https://github.com/MRCIEU/gwas2vcf.

Don't know if there is a way to support VCF format for Pheweb.

Best regards, Jie

jielab avatar Sep 08 '21 15:09 jielab

Do you mean that internally pheweb should store everything in tabixed bgzipped GWAS-VCF instead of the current tabixed bgzipped tsv files? Why? How would that make queries more efficient?

Or do you just want to use GWAS-VCF as input to create a pheweb? It should be easy to write a script that converts GWAS-VCF into the input format pheweb requires. Do you have one file per phenotype, or many phenotypes in a single file?

pjvandehaar avatar Sep 08 '21 23:09 pjvandehaar

Dear Peter:

I mean the latter, pheweb to use GWAS-VCF as input. As you know, these GWAS files with millions of rows are huge. It is very confusing and headache that each software needs different columns and column names. I think we should use VCF's capacity for fast query, which comes with a vcf.tbi file.

I hope that you have a few minutes to read this paper https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02248-0, and agree that supporting VCF format is a good idea.

Best regards, Jie

jielab avatar Sep 09 '21 03:09 jielab

Just wanted to chime in that MungeSumstats might be helpful here:

  • It can read in VCF or tabular format and standardise column names from a wide variety of inputs.
  • After munging is complete, you can export the files as either (tabix-indexed) tabular or VCF formats.
  • It also provides API-access to the IEU MRC Open GWAS database. @Al-Murphy @NathanSkene

bschilder avatar Nov 06 '21 04:11 bschilder

thank you veyr much!

best regards, Jie

jielab avatar Nov 08 '21 23:11 jielab