the default bin size of batch --method wgs
Dear sir, I tried to apply "cnvkit.py batch Sample1.bam -n Control1.bam -m wgs -f hg38.fasta --annotate refFlat.txt" to my WGS data. When use gencode.v37.annotation as target bed, we got very large segment. However, we need to smaller the segment size. We would like to know what the default bin size.
Hi @deb0612 ,
Not an author of CNVkit, but I suggest you to look carefully into CNVkit's output of your command and you should see a line like: WGS average depth <FLOAT> --> using bin size<THE_NUMBER_YOU_WANT>
HOWEVER tweaking the default bin size may not be the best way to make your segment smaller
=> Maybe you should rather try another segmentation method (default = "CBS")
=> If you are using CNVkit >= v0.9.7, you have a --segment-method parameter that allows you to switch easily
Hope this helps. Have a nice day ! Felix.
As @tetedange13 said (by the way, thank you, those are very helpful tips!), the automatically determined bin size should be fine in almost all situations and is unlikely to significantly influence the resulting segments.
And @deb0612, just to confirm, what do you mean by a very large segment? It's worth noting that, since the majority of the genome is not affected by CNVs, cnvkit.py segment will usually output lots and lots of very large segments (tens of millions of bases long) with normal log2 (≈0), and those are expected.
To investigate further, could you share the CNS file please (or its relevant portion)?
For WGS, definitely use segmetrics and call to filter by CI. Then it may be more practical to use bintest or genemetrics to extract only focal CNVs or those that affect genes of interest.