datatable
datatable copied to clipboard
to_csr
-
The current datatable supports .to_numpy. .to_pandas to convert Frame into dense array, but seems not supports conversion into sparse array, e.g., scipy.sparse.csr_matrix
-
We could convert to the sparse array with two steps but is slow for very large Frame.
data = dt.fread(rna_file)
subset = data[:, 1:].to_numpy()
output = scipy.sparse.csr_matrix(subset)
- Could Frame extend a function to convert to sparse array directly? Thanks in advance!
Since datatable does not support sparse representation, I'm not sure this can be sped up much further. Have you tried timing the individual steps? I.e. what proportion of time is spent reading the file, vs. converting to numpy (dense), vs. converting into sparse representation in the end?