datatable icon indicating copy to clipboard operation
datatable copied to clipboard

to_csr

Open qinqian opened this issue 5 years ago • 1 comments

  • The current datatable supports .to_numpy. .to_pandas to convert Frame into dense array, but seems not supports conversion into sparse array, e.g., scipy.sparse.csr_matrix

  • We could convert to the sparse array with two steps but is slow for very large Frame.

data = dt.fread(rna_file)
subset = data[:, 1:].to_numpy()
output = scipy.sparse.csr_matrix(subset)
  • Could Frame extend a function to convert to sparse array directly? Thanks in advance!

qinqian avatar Jan 02 '21 15:01 qinqian

Since datatable does not support sparse representation, I'm not sure this can be sped up much further. Have you tried timing the individual steps? I.e. what proportion of time is spent reading the file, vs. converting to numpy (dense), vs. converting into sparse representation in the end?

st-pasha avatar Jan 06 '21 23:01 st-pasha