HiCExplorer icon indicating copy to clipboard operation
HiCExplorer copied to clipboard

3.7.2 hicPCA produces extreme eigenvalues

Open xscapintime opened this issue 3 years ago • 5 comments

Hi,

I'm using the latest version of HiCExplorer, and I have had used version 3.6.* last year. The eigenvalues produced are completly different. I have read the code and commit info saying 3.7.2 is more Lieberman-Aiden way, PCA on an obs/exp matrix, and in 3.6 it's PCA on an pearson's matrix.

Python version: 3.8.13

I used the same command, as below, one for pearson matrix and eigenvalue bedgraph, and one for eigenvalue bw,

hicPCA -m ${inp} --outputFileName ${out}.pca1.bedgraph -we 1 --format bedgraph \
--pearsonMatrix ${out}_pearson_all.h5 \
--extraTrack ../histonemark/ENCODE_H1_H3K27ac.bigwig

hicPCA -m ${inp} --outputFileName ${out}.pca1.bw -we 1 --format bigwig \
--extraTrack ../histonemark/ENCODE_H1_H3K27ac.bigwig

here is the result produced by 3.6, I use np.histogram to have a quick glance

np.histogram(bed[3])
(array([  43,  486, 3351, 3882, 4228, 5106, 2034,  413,  109,   31]), array([-0.10961274, -0.08537601, -0.06113927, -0.03690253, -0.0126658 , 0.01157094,  0.03580768,  0.06004441,  0.08428115,  0.10851789, 0.13275462]))

So the range of PC1 is about -0.1 to 0.13. pearson 3.6

and this is by 3.7.2

np.histogram(bed2[3])
(array([    1,     2,     5,   882, 20226,   117,    21,     7,     2, 431]), array([-0.73764337, -0.56387903, -0.3901147 , -0.21635036, -0.04258602, 0.13117831,  0.30494265,  0.47870699,  0.65247133,  0.82623566, 1.        ]))

And now the range of PC1 is about -0.7 to 1, and most of the values are very close to 0.

pearson 3.7.2

Personally I don't think the results from 3.7.2 looks right.

In this paper they said PCA was done on contact matrix. And the distribution of PC1 is similar to the results from hicPCA 3.6.

image

Thank you.

xscapintime avatar Nov 17 '22 06:11 xscapintime

Hi, same issue. The pearson correlation matrix looks nice and the bigwig/bedgraph values are extreme an don`t match. I checked -we 1 and 2. Thanks

ralfgilsbach avatar Dec 07 '22 18:12 ralfgilsbach

I also found the same issue, I wonder if anyone has a better explanation. I guess that with the shortening of bin length, it may be more likely to have some abnormally high observations, and therefore extreme eigenvalues.

@xscapintime @ralfgilsbach I wonder how you finally dealt with this problem?

zhongzheng1999 avatar Apr 02 '24 13:04 zhongzheng1999

We moved back to homertools for eigenvector calculations. It should be fixed in hicexplorer to work in a comparable manner.

ralfgilsbach avatar Apr 08 '24 06:04 ralfgilsbach

@zhongzheng1999 Hi, I changed to cooltoos for all the analysis.

xscapintime avatar Apr 08 '24 15:04 xscapintime

@xscapintime @ralfgilsbach Thank you for your reply! I think it's more reliable to use some good old tools to do the work.

zhongzheng1999 avatar Apr 08 '24 16:04 zhongzheng1999