methylseq icon indicating copy to clipboard operation
methylseq copied to clipboard

Add SNP calling

Open ewels opened this issue 7 years ago • 6 comments

It would be nice to be able to have the option of calling variants from bisulfite data.

It shouldn't be too tricky to add Bis-SNP or something similar as a new opt-in process. There may be other / better tools also?

ewels avatar Jan 30 '19 18:01 ewels

Felix Krueger mentioned four different packages for that purpose. Bis-SNP, MethylExtract, BS-SNPer and CGmapTools. Also, BScall can do.

Also bit different stuff, from Wreczycka et all paper 2017:

"the majority of CpGs with high inter-population differences contain common genomic SNPs (minor allele frequency > 0.01) (Daca-Roszaket al., 2015). To ensure more reliable interpretation of the data we advise removing known C/T SNPs which can interfere with methylation calls."

It would be also nice to have a dictionary with these sites for human and possibility of removing it, if desired (--remove.common_snps).

Variant calls could be also derived from matched genome sequencing data or public databases such as dbSNP (https://www.ncbi.nlm.nih.gov/projects/SNP/dbSNP.cgi?list=sslist)

bazyliszek avatar Jan 31 '19 08:01 bazyliszek

Ooh, @FelixKrueger? I wouldn't trust that guy.. 😆 Yes all sounds good - does anyone have a favourite tool?

The common SNPs feature would be nice, but I guess that's a separate issue as it doesn't require SNP calling, it's just a filtering step right? Do such lists already exist somewhere? Perhaps we can generate such a list from a VCF file in the pipeline. Then we could use the files available for multiple species already in iGenomes.

I think that matching to WGS and external databases is perhaps beyond the scope of this pipeline for now. If the pipeline produces a VCF it shouldn't be too difficult for people to play with this anyway. We could perhaps even make a separate nf-core pipeline for doing pairwise comparison / QC of VCF files...

ewels avatar Jan 31 '19 15:01 ewels

I agree, it might be a nice pipeline to have. The tools mentioned above were - of course (in good old bioinformatics manner) - shown to be much superior to previously published tools. We don't personally use SNP exclusion on a regular basis, so I am not sure which one is best/easiest to implement.

On a slightly different note, would anyone object if we dropped Bowtie (1) from Bismark, and added HISAT2 instead?

FelixKrueger avatar Jan 31 '19 16:01 FelixKrueger

Sure - go for it! Alignment speed can be one of the main annoyances with Bismark so a faster tool with comparable output would be great 👍 (though does this mean that I have to update the --relaxMismatches code? 😱 )

ewels avatar Jan 31 '19 16:01 ewels

Hi, was this ever implemented or is there a fork that some work was done on?

brucemoran avatar Oct 27 '21 11:10 brucemoran