dask-ml
dask-ml copied to clipboard
Inclusion of databases?
Howdy Folks,
I see that parallel processing stuff and linear regression is supported to work distributedly from an input which is a csv file. I want to execute a clustering algorithm with data which is stored in database. Yeah I know there is a huge downside to actually just downloading the data of a file, since the query operations might take time (they are simple) and the transmission from server to client even more. But is there a possibility to actually use as input from rows the databases and whilst caching data (Vectors for examples) in the working memory or/and store the important metrics of the algorithm in database table?