deepTools Help!!! Different results by using extendReads or not

Dear deeptools developer, Thanks for developing such a great tool! I recently ran a Cut & Tag experiment(2 X 150bp) and want to use the plots from deeptools to show the signal distribution of different histone markers on TSS regions. However, I got different results when I used "extendReads" or not in bamCoverage function with BPM normalization. Please see the plot shown below:

CTK27me3_uscs bpm autosomes_chrX ext_no_ext_comp

The plot showed that the male sample (the "m" in the sample name) has a higher reads intensity than female sample (the "f" in the sample name) when I used extendReads, but the direction is opposite when I deactivated extendReads. I thought it was possibly caused by the quantification issue that the mate reads in properly paired reads were quantified twice. So I included "--samFlagInclude 66" in bamCoverage to quantify the mate reads only once with extendReads activated. However, the result showed similar as the one activating extendReads without samFlagInclude parameter (see below):

CTK27me3_uscs bpm autosomes_chrX samflag66

I am wondering why "extendReads" caused so much different results? This is important since it would results in different interpretation for biology.

deeptools version: 3.5.1 python version: 3.9.13

Here are the codes: ###with extendReads activated bamCoverage --bam .bam -o .extent.bw --ignoreDuplicates --normalizeUsing BPM --blackListFileName ../blacklist.merge.bed --binSize 10 --ignoreForNormalization chrX chrY chrM --extendReads

###with extendReads deactivated bamCoverage --bam .bam -o .extent.bw --ignoreDuplicates --normalizeUsing BPM --blackListFileName ../blacklist.merge.bed --binSize 10 --ignoreForNormalization chrX chrY chrM

###with samFlagInclude 66 bamCoverage --bam .bam -o .extent_samflag66.bw --ignoreDuplicates --normalizeUsing BPM --blackListFileName ../blacklist.merge.bed --binSize 10 --ignoreForNormalization chrX chrY chrM

###computeMatrix codes computeMatrix reference-point --referencePoint TSS -p 20 -b 2500 -a 2500 -R .bed -S .bw --missingDataAsZero -o .gz

I look forward to your reply! Many thanks in advance!!

Joe Wang

Oct 18 '22 06:10 icanwinwyz

Is it necessary to use extendReads parameter for CUT& Tag paired-end sequencing data?

Nov 12 '24 06:11 tzhu-bio

Hi @icanwinwyz I am also curious about this because when using --normalizeUsing RPKM we also observe a huge difference in the coverage plots:

Green is where --extendReads is switched on. Red is where --extendReads is switched off. Blue is ChipHub's published data for the same plant sample

And in both cases (red/green) the same --binSize 10 and all other parameters are the same, and --normalizeUsing RPKM is specified other settings are set to default and the input reads are Paired End.

So far I have traced how --extendReads impacts deeptools bamcoverage as follows:

bamCoverage: the main() block from L143 https://github.com/deeptools/deepTools/blob/master/deeptools/bamCoverage.py#L143

Has this:

        wr = writeBedGraph.WriteBedGraph([args.bam],
                                         binLength=args.binSize,
                                         stepSize=args.binSize,
                                         region=args.region,
                                         blackListFileName=args.blackListFileName,
                                         numberOfProcessors=args.numberOfProcessors,
                                         extendReads=args.extendReads,
                                         minMappingQuality=args.minMappingQuality,
                                         ignoreDuplicates=args.ignoreDuplicates,
                                         center_read=args.centerReads,
                                         zerosToNans=args.skipNonCoveredRegions,
                                         samFlag_include=args.samFlagInclude,
                                         samFlag_exclude=args.samFlagExclude,
                                         minFragmentLength=args.minFragmentLength,
                                         maxFragmentLength=args.maxFragmentLength,
                                         chrsToSkip=args.ignoreForNormalization,
                                         verbose=args.verbose,
                                         )

Which is always triggered in all IF/ELSE conditions (i.e. no matter which normalization you use).

writeBedGraph.py L30: https://github.com/deeptools/deepTools/blob/83f192947606c62028717f0e0824dc23a07536dc/deeptools/writeBedGraph.py#L30

has a class: class WriteBedGraph(cr.CountReadsPerBin)

which uses: import deeptools.countReadsPerBin as cr

Looking at: countReadsPerBin.py: L30 https://github.com/deeptools/deepTools/blob/83f192947606c62028717f0e0824dc23a07536dc/deeptools/countReadsPerBin.py#L31C7-L31C23

We can see how args.extendReads is coming into play:

And we have the documentation of it here: https://deeptools.readthedocs.io/en/latest/_modules/deeptools/writeBedGraph.html

So what's happening is that by using --extendReads the number of reads that are counted per bin will be higher, since now R1 and R2 will be "filled" along the full length of their component fragment.

Quoting documentation: https://deeptools.readthedocs.io/en/latest/_modules/deeptools/writeBedGraph.html

"extendReads : bool, int

Whether coverage should be computed for the extended read length (i.e. the region covered by the two mates or the regions expected to be covered by single-reads). If the value is ‘int’, then then this is interpreted as the fragment length to extend reads that are not paired. For Illumina reads, usual values are around 300. This value can be determined using the peak caller MACS2 or can be approximated by the fragment lengths computed when preparing the library for sequencing. If the value is of the variable is true and not value is given, the fragment size is sampled from the library but only if the library is paired-end. Default: False "

Apr 11 '25 13:04 a1ultima