How to filter the results of roh
Dear @pd3
I am using bcftools roh (version 1.12; bcftools roh -G30 -e - -O r vcf.gz) to detect regions of autozygosity in whole genome sequencing data. The vcf.gz contains all my studied samples which can be clustered into five different populations, and I want to characterize the autozygosity for each population. After running above command, it reports the candidate RoHs for each sample. I can understand most of the output, however, I cannot find some detailed explaination about the Quality (average fwd-bwd phred score)? If I want to filter for high quality regions of autozygosity, can Quality be used as an indicator? The larger of Quality, the higher quality of the RoHs, is this right?
Best regards, Zheng zhuqing
Yes, that's correct. The quality is the phred score of the average fwd-bwd probability in the called region. Length and fragmentation of calls matters too. There is a plan to provide tools for visualization of the calls to help in this process, but this is in early stages and at the moment not actively developed.
Dear @pd3
I am trying to run bcftools roh -G30 -e - -O r -o ${chr}.roh ${dir}/${chr}.vcf.gz to detect regions of autozygosity for each sample. I run on each chromosome separately. However, it takes me about four days without any outputs for the longest chromosome, in which includes ~16.3 million SNPs and 577 samples. I wonder if I can filter out those site with low minor allele frequency (e.g., MAF < 0.01) to speed up the program.
Best regards, Zheng zhuqing
It should be possible to stream through view first?
bcftools view -i 'MAF<0.01' -Ou file.vcf.gz | bcftools roh ...
Dear @pd3
Thank you. Actually, I would use the command you pasted above to speed up the program. However, I do not know the effects on the results after filtering out low frequency variants. Would you kindly give us some suggestions when detecting roh, such as whether filtering low frequency variants, prunning sites by LD or not?
Best regards, Zheng zhuqing