cooler icon indicating copy to clipboard operation
cooler copied to clipboard

Convert numy to Cooler

Open awezmm opened this issue 5 years ago • 2 comments

How can I convert a numpy array to cooler format? I was looking at this https://github.com/mirnylab/cooler/issues/33

but it seems that the cooler.io.DenseLoader() is gone. How else can I convert my heatmap numpy array to an iterator for the cooler create function?

awezmm avatar Mar 27 '20 07:03 awezmm

Sorry for the late reply. You should be able to follow the same approach using:

from cooler.create import ArrayLoader

nvictus avatar Apr 20 '20 10:04 nvictus

just posting a complete working example here :

from cooler.create import ArrayLoader
import h5py
import cooler
h = h5py.File("cworld-test_hg19_C-40000-raw.hdf5", 'r')
heatmap = h['interactions']

# create some bins , using cooler-binnify or some other way
binsize = 40000
chromsizes = pd.read_csv(                                                            
    'hg19.reduced.chromsizes', 
    sep='\t', 
    names=['name', 'length']).set_index('name')['length']
bins = cooler.binnify(chromsizes, binsize)

# turn h5oy dataset (2D matrix) into a stream of sparse matrix chunks :
iterator = ArrayLoader(bins, heatmap, chunksize=int(1e6))

# load that into cooler:
cooler.create_cooler('output.40kb.cool', bins, iterator, dtypes={"count":"int"}, assembly="hg19")

couple of notes:

  • make sure chunksize is provided as int otherwise it cooler.utils.partition (ArrayLoader) fails with hard-to-read message chunksize=int(1e6)
  • make sure to cast your input array to int or float if it isn't what you want it to be in the beginning. dtypes={"count":"int"} - worked great, for float heatmap

sergpolly avatar Sep 09 '20 18:09 sergpolly