Can feature engine work with modin.pandas?
Is possible to make feature engine compatible with pandas dataframes created by modin? https://docs.ray.io/en/latest/data/modin/index.html
Pandas modin is very effective way to paralelly computation of dataframes. The dataframe type of modin dataframes is <class 'modin.pandas.dataframe.DataFrame'>
but in file https://github.com/feature-engine/feature_engine/blob/main/feature_engine/dataframe_checks.py modin.dataframes are not allowed.
Hi @PeterPirog
Thanks a lot for the suggestion!
Long term, I would say yes. I thought of dask. I wasn't aware of modin but we can explore this option as well.
Short term, we have no capacity to embark on adding and then supporting this functionality.
We would need at least one person dedicated to introduce this and then helping support the code for the foreseeable future.
@solegalli Thank You for the answer. I asked about modin with ray parallel computations because Ray is very useful for pytorch an tensorflow distributed computation. I use Ray for tensorflow hyperparametres opimization, here is example: https://docs.ray.io/en/latest/tune/examples/tune_mnist_keras.html. I will try to modify feature engine source and check how it works. Modin pandas has high level of compatibillity wittypical pandas so I hope it can work together
For me ideal situation is when sklearn, modin pandas, ray, tensorflow and feature engine can work together :)
@solegalli I opened issue on modin github about modin equivalents for some pandas classes: https://github.com/modin-project/modin/issues/4236
You wrote about dask, modin has support for ray and dask: https://modin.readthedocs.io/en/stable/
pip install "modin[ray]" # Install Modin dependencies and Ray to run on Ray
pip install "modin[dask]" # Install Modin dependencies and Dask to run on Dask
pip install "modin[all]" # Install all of the above
@PeterPirog great! Thank you.