mljar-supervised icon indicating copy to clipboard operation
mljar-supervised copied to clipboard

Add support for sparse data

Open aburkov opened this issue 5 years ago • 2 comments

Hey. Do you have plans to support sparse matrices as input? The requirement for dense input rules out a lot of real-world scenarios like text classification.

aburkov avatar Feb 14 '21 21:02 aburkov

I would love to add support for sparse data!

How big is your dataset? Are you able to proceed with the current package version?

pplonski avatar Feb 14 '21 21:02 pplonski

I would love to add support for sparse data!

How big is your dataset? Are you able to proceed with the current package version?

Hi Piotr. No, I cannot proceed. My dataset has 800k documents converted using bag of words into 200k dimensional vectors of TF-IDF scores. On a machine with 120G of RAM it doesn't fit in memory as a dense array. Many sklearn algorithms do support sparse input. xgboost and LightGBM as well so it would be nice if your tool at least allowed sparse data for those algorithms.

aburkov avatar Feb 14 '21 22:02 aburkov