optimal resolution for compartment call
Hello,
I noticed that when different resolution used, the compartment results may change. Is there any standard to help decide an optimal value for binsize?
I attached some tests using different binsizes in the screenshot. And the reference track data is downloaded from 4DN database for compartments (informative but cannot be perfect).
Thank you!
Best,
Hi,
This is a good question but a difficult one to answer. In our dcHiC paper we have a section called "Robustness of dcHiC differential compartment calls", where we compared the results from different resolutions. In summary this is what found - We observed that except from 10 Kb resolution, all other cases were similar to 100Kb where 40% down-sampling still led to a high recall (>75%), whereas for 10 Kb resolution, the results at 60% down-sampling had a recall of 80% that dropped to 61% for 40% down-sampling. So even if you perform a separate 100Kb and 10Kb resolution analysis you would see a very high concordance if you keep your HiC coverage the same. If you again look at the figure 2E and Supplementary Figure 6 from Harris et al paper, you will see that at higher resolution smaller new compartments starts to appear at different places and there is a biological explanation to it also. In the Supplementary Figure S6 of our dcHiC paper too, we also show this effect. At the end, the resolution you want to utilize depends on the depth of your Hi-C data. I like to follow a thumbs rule from an ancient paper Lajoie et al that suggested - In our experience, an adequately complex Hi-C dataset for the human genome with roughly 100 million mapped/valid junction reads, is sufficient to support a 40 kb data resolution. Data below 40 kb may be usable, though it will suffer from a higher level of noise. However, to make the results more robust to resolution effect you can try out the consensus approach. Perform both 100Kb (Lowest resolution) and the highest resolution (supported by your HiC coverage) dcHiC analysis followed by taking the consensus regions. This will make sure you have the best result given your coverage.
Thanks for your suggestions.
I will run my samples with different resolutions and generate a consensus compartment.
Best,