Loss of *.cool 'weight' column when working with hicAdjustMatrix; possible to retain 'weight' column?
Welcome to the HiCExplorer GitHub repository! Before opening the issue please check that the following requirements are met :
- [x] Search whether this issue (or a similar issue) has been solved before using the search tab above. Link the previous issue if appropriate below.
Maybe on TODO already? The TODO bullet references this issue.
- [x] Paste your HiCExplorer version (
hicInfo --version) and your python version (python --version) below.
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
hicInfo 3.7-dev
Python 3.8.3
cooler, version 0.8.11
- [x] Have you checked our documentation on hicexplorer.readthedocs.io?
- [x] Do you use conda to install HiCExplorer?
- [x] Do you use the latest HiCExplorer release? If not, please install it via a conda environment:
conda create --name hicexplorer hicexplorer=3.6 python=3.8 -c bioconda -c conda-forgeand activate the environment:conda activate hicexplorer. Retry your command. You can exit a conda environment viaconda deactivate. To learn more about conda and environments, please consider the following documentation.
Retry your command, is it solved now? If not please continue with the following:
- [x] Paste the full HiCExplorer command that produces the issue below (ignore if you simply spotted the issue in the code/documentation).
parallel --header : --colsep " " -k -j "${parallel}" \
"hicAdjustMatrix \
--matrix {infile} \
--outFileName {outfile} \
--chromosomes chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX \
--interIntraHandling {intra_inter}" \
:::: "${list}"
# have tested with {intra_inter} as either "inter" or "intra"
#+ {infile} is, for example, cool2cool_balance_cardioPooled_D0_500000_downsampled_pooled_cis.0.78.cool
#+ {outfile} is, for example, cool2cool_balance_cardioPooled_D0_500000_downsampled_pooled_cis.0.78.sans-inter.cool
- [x] Paste the output printed on screen from the command that produces the issue below (ignore if you simply spotted the issue in the code/documentation).
Example hicInfo readout for infile, cool2cool_balance_cardioPooled_D0_500000_downsampled_pooled_cis.0.78.cool
# Matrix information file. Created with HiCExplorer's hicInfo version 3.6
File: /net/noble/vol7/kga0/2020_endothelial-diff/data/HiC-Pro_hic2cool_cool2cool_downsampled_balance/500000/pooled_cis/cool2cool_balance_cardioPooled_D0_500000_downsampled_pooled_cis.0.78.cool
Date: 2021-07-20T17:28:02.057103
Genome assembly: unknown
Size: 6,189
Bin_length: 500000
Chromosomes:length: chrM: 16569 bp; chr1: 248956422 bp; chr2: 242193529 bp; chr3: 198295559 bp; chr4: 190214555 bp; chr5: 181538259 bp; chr6: 170805979 bp; chr7: 159345973 bp; chr8: 145138636 bp; chr9: 138394717 bp; chr10: 133797422 bp; chr11: 135086622 bp; chr12: 133275309 bp; chr13: 114364328 bp; chr14: 107043718 bp; chr15: 101991189 bp; chr16: 90338345 bp; chr17: 83257441 bp; chr18: 80373285 bp; chr19: 58617616 bp; chr20: 64444167 bp; chr21: 46709983 bp; chr22: 50818468 bp; chrX: 156040895 bp; chrY: 57227415 bp;
Number of chromosomes: 25
Non-zero elements: 13,690,671
The following columns are available: ['chrom' 'start' 'end' 'weight']
Generated by: cooler-0.8.6.post0
Example hicInfo readout for outfile, cool2cool_balance_cardioPooled_D0_500000_downsampled_pooled_cis.0.78.sans-inter.cool (i.e., retain cis contacts only)
# Matrix information file. Created with HiCExplorer's hicInfo version 3.6
File: /net/noble/vol7/kga0/2020_endothelial-diff/data/HiC-Pro_hic2cool_cool2cool_downsampled_balance/500000/pooled_cis/cool2cool_balance_cardioPooled_D0_500000_downsampled_pooled_cis.0.78.sans-inter.cool
Date: 2021-07-25T14:31:02.147295
Genome assembly: unknown
Size: 6,073
Bin_length: 500000
Chromosomes:length: chr1: 248956422 bp; chr2: 242193529 bp; chr3: 198295559 bp; chr4: 190214555 bp; chr5: 181538259 bp; chr6: 170805979 bp; chr7: 159345973 bp; chr8: 145138636 bp; chr9: 138394717 bp; chr10: 133797422 bp; chr11: 135086622 bp; chr12: 133275309 bp; chr13: 114364328 bp; chr14: 107043718 bp; chr15: 101991189 bp; chr16: 90338345 bp; chr17: 83257441 bp; chr18: 80373285 bp; chr19: 58617616 bp; chr20: 64444167 bp; chr21: 46709983 bp; chr22: 50818468 bp; chrX: 156040895 bp;
Number of chromosomes: 23
Non-zero elements: 776,949
The following columns are available: ['chrom' 'start' 'end']
Generated by: HiCMatrix-15
Cooler library version: cooler-0.8.11
HiCMatrix url: https://github.com/deeptools/HiCMatrix
# fwiw, I have excluded chrM and chrY in the conversion from infile to outfile...
Wanted behavior: Retention of column 'weight' in outfile (and/or an option to retain or discard the column) Actual behavior: Loss of column 'weight' in outfile
Note: Thank you for implementing an option to exclude to cis or trans contacts via hicAdjustMatrix. This makes different kinds of Hi-C analyses easier to do. If possible, I think it would be useful to have the ability to retain the weights from balanced matrices (i.e., matrices previously comprised of both cis and trans contacts). For now, I would like to avoid re-balancing with cis or trans contacts only. I can imagine that this option would be useful for adjusted matrices comprised of subsets of chromosomes, including both cis and trans contacts.
Hi, I've looked into this a bit more and it appears that, in the process of using hicAdjustMatrix, unbalanced counts are replaced by the balanced counts, hence the loss of the column 'weight'. This is clear in the attached images.
I think this behavior is fine for my use cases (although I'm not sure about other researchers).
I tried to find this in the code, but didn't really see it after a quick check of hicAdjustMatrix.py in the develop branch.
Anyways, thanks!

Hi,
this is more a feature than a bug :)
The issue is a bit of a historical one now; the h5 files never stored the correction factors and raw values separately, therefore in many parts of the source code, including in hicAdjustMatrix, we have not changed this behavior for the cool files. As long as the users stay within HiCExplorer, it should also not matter. However, I think it is not too difficult to change the behavior; we just need to implement it. Maybe in the next 3.8 release, but given the time I have to implement new features, I cannot promise a 3.8 release will be published this year.
Thanks for the report and all the best,
Joachim