Tractor icon indicating copy to clipboard operation
Tractor copied to clipboard

Query cohort imputed with TOPMed reference panel (Michigan server hg38 build)

Open samreenzafer opened this issue 2 years ago • 0 comments

Hi, We have a genotyping cohort with samples (N>5000) of multiple races, and they have been imputed using TOPMed Imputation server (https://topmedimpute.readthedocs.io/en/latest/getting-started/), because this is the largest multi-ethnic reference panel till date. The output imputed data from the server is not phased, and it is in hg38 build.

Can you briefly describe how I should proceed, if my intent is to run tractorGWAS, with all 5 major ancestries using the logistic model.

  1. Do I need to leftover the imputed VCF files from hg38 to hg19, so that I can use 1000G_Phase3 reference panel (which is hg19) for the Phasing (shapeIT), LAI (rfmix) and so on.. ? Or Should I use the 1000G_Phase3 (hg38) phased vcd files, and convert other supporting files like genetic map to hg38 build.
  2. Have you tested extract_tracts.py --num-ancs 5 ? If so, do you foresee any problems with the results I may see.
  3. I'm assuming we do not need to provide any population based covariates (example, PCs derived from eigenstrat) to tractor. It wasn't mentioned anywhere in the tutorial, but I'm assuming it to be so, because this is local ancestry aware GWAS.

I'm not sure, if I'm asking all the right questions to plan out my work. Thank you for your time.

samreenzafer avatar Nov 28 '23 20:11 samreenzafer