Run Tractor for 3 ancestry groups but the final output only has one ancestry p-value

Open ryshi06 opened this issue 6 months ago • 1 comments

Hello,

I am trying to run the ancestry specific GWAS for EUR, AFR, and EAS using the 1000 Genome WGS reference. I have separated the steps of infer local ancestry (rfmix), and extract tracts separately for each ancestry group. And I renamed the all three sets of output back to anc0, anc1, and anc2. However, in the output from run_tractor, I only have the beta and p estimated for anc0 only. I have pasted the chunk of code below and would like to hear some advice. Thank you for the help.

My output is like this:

head -n3 common_ADSPR5_AmyloidPET_Tractor_1.txt CHR POS ID REF ALT N AF_anc0 LAprop_anc0 beta_anc0 se_anc0 pval_anc0 tval_anc0 AF_anc1 LAprop_anc1 beta_anc1 se_anc1 pval_anc1 tval_anc1 AF_anc2 LAprop_anc2 beta_anc2 se_anc2 pval_anc2 tval_anc2 LApval_anc0 LAeff_anc0 LApval_anc1 LAeff_anc1 1 17519 1:17519:G:T G T 6393 0.033474 0.333333 0.239298 1.8603 0.897651597635062 0.1286 0.033474 0.333333 0.033474 0.333333 1 108826 1:108826:G:C G C 6393 0.032301 0.333333 0.181305 1.875 0.922970806545642 0.0967 0.032301 0.333333 0.032301 0.333333

rfmix \
-f common_ADSPR5_AmyloidPET_${chr}_phased.vcf.gz \
-r ${ref_url}/1000G_Phased/1kGP_high_coverage_Illumina.chr${chr}.filtered.SNV_INDEL_SV_phased_panel_nochr.vcf.gz \
-m ${ref_url}/1000G_Phased/afr_igsr_samples.tsv \
-g ${ref_url}/genetic_map_b38/chr${chr}.b38.gmap \
-o common_ADSPR5_AmyloidPET_${chr}_AFR_deconvoluted \
--chromosome=${chr}

rfmix \
-f common_ADSPR5_AmyloidPET_${chr}_phased.vcf.gz \
-r ${ref_url}/1000G_Phased/1kGP_high_coverage_Illumina.chr${chr}.filtered.SNV_INDEL_SV_phased_panel_nochr.vcf.gz \
-m ${ref_url}/1000G_Phased/eur_igsr_samples.tsv \
-g ${ref_url}/genetic_map_b38/chr${chr}.b38.gmap \
-o common_ADSPR5_AmyloidPET_${chr}_EUR_deconvoluted \
--chromosome=${chr}

rfmix \
-f common_ADSPR5_AmyloidPET_${chr}_phased.vcf.gz \
-r ${ref_url}/1000G_Phased/1kGP_high_coverage_Illumina.chr${chr}.filtered.SNV_INDEL_SV_phased_panel_nochr.vcf.gz \
-m ${ref_url}/1000G_Phased/eas_igsr_samples.tsv \
-g ${ref_url}/genetic_map_b38/chr${chr}.b38.gmap \
-o common_ADSPR5_AmyloidPET_${chr}_EAS_deconvoluted \
--chromosome=${chr}

### EUR

python ${app_url}/Tractor/Tractor/scripts/extract_tracts.py \
--vcf common_ADSPR5_AmyloidPET_${chr}_phased.vcf.gz \
--msp common_ADSPR5_AmyloidPET_${chr}_EUR_deconvoluted.msp.tsv \
--num-ancs 1

# Update file name, rename anc0 to EUR
for file in /ix/kfan/Ruyu/Public_Data/NIAGADS_ADSP/Amyloid_PET_Project/Processed_Data/Genotype/Tractor/*anc0*; do
  newname="${file//anc0/EUR}"
  mv "$file" "$newname"
done

### AFR

python ${app_url}/Tractor/Tractor/scripts/extract_tracts.py \
--vcf common_ADSPR5_AmyloidPET_${chr}_phased.vcf.gz \
--msp common_ADSPR5_AmyloidPET_${chr}_AFR_deconvoluted.msp.tsv \
--num-ancs 1

# Rename anc0 to AFR
for file in /ix/kfan/Ruyu/Public_Data/NIAGADS_ADSP/Amyloid_PET_Project/Processed_Data/Genotype/Tractor/*anc0*; do
  newname="${file//anc0/AFR}"
  mv "$file" "$newname"
done

### EAS

python ${app_url}/Tractor/Tractor/scripts/extract_tracts.py \
--vcf common_ADSPR5_AmyloidPET_${chr}_phased.vcf.gz \
--msp common_ADSPR5_AmyloidPET_${chr}_EAS_deconvoluted.msp.tsv \
--num-ancs 1

# Rename anc0 to EAS
for file in /ix/kfan/Ruyu/Public_Data/NIAGADS_ADSP/Amyloid_PET_Project/Processed_Data/Genotype/Tractor/*anc0*; do
  newname="${file//anc0/EAS}"
  mv "$file" "$newname"
done

### Final rename

# Rename EUR to anc0
for file in ./Processed_Data/Genotype/Tractor/*EUR*; do
  newname="${file//EUR/anc0}"
  mv "$file" "$newname"
done

# Rename AFR to anc1
for file in ./Processed_Data/Genotype/Tractor/*AFR*; do
  newname="${file//AFR/anc1}"
  mv "$file" "$newname"
done

# Rename EAS to anc2
for file in ./Processed_Data/Genotype/Tractor/*EAS*; do
  newname="${file//EAS/anc2}"
  mv "$file" "$newname"
done

Jul 15 '25 02:07 ryshi06

Hi @ryshi06

One possible reason is how the delimiter is being parsed for your sumstats file. In some cases, Tractor may output NAs (empty values) for certain columns. For example, if a variant is absent in an ancestry, you won’t have a p-value or beta value, so those fields will be empty. Some file visualizing software may then collapse consecutive delimiters into one, which can cause the columns to shift.

Could you try reading the file in R or in Python with pandas, explicitly specifying tab (\t) as the delimiter, and let me know if the issue persists?

Sep 03 '25 19:09 nirav572