feature_engine icon indicating copy to clipboard operation
feature_engine copied to clipboard

Can feature engine work with modin.pandas?

Open PeterPirog opened this issue 3 years ago • 4 comments

Is possible to make feature engine compatible with pandas dataframes created by modin? https://docs.ray.io/en/latest/data/modin/index.html

Pandas modin is very effective way to paralelly computation of dataframes. The dataframe type of modin dataframes is <class 'modin.pandas.dataframe.DataFrame'>

but in file https://github.com/feature-engine/feature_engine/blob/main/feature_engine/dataframe_checks.py modin.dataframes are not allowed.

PeterPirog avatar Feb 19 '22 19:02 PeterPirog

Hi @PeterPirog

Thanks a lot for the suggestion!

Long term, I would say yes. I thought of dask. I wasn't aware of modin but we can explore this option as well.

Short term, we have no capacity to embark on adding and then supporting this functionality.

We would need at least one person dedicated to introduce this and then helping support the code for the foreseeable future.

solegalli avatar Feb 20 '22 15:02 solegalli

@solegalli Thank You for the answer. I asked about modin with ray parallel computations because Ray is very useful for pytorch an tensorflow distributed computation. I use Ray for tensorflow hyperparametres opimization, here is example: https://docs.ray.io/en/latest/tune/examples/tune_mnist_keras.html. I will try to modify feature engine source and check how it works. Modin pandas has high level of compatibillity wittypical pandas so I hope it can work together

For me ideal situation is when sklearn, modin pandas, ray, tensorflow and feature engine can work together :)

PeterPirog avatar Feb 20 '22 16:02 PeterPirog

@solegalli I opened issue on modin github about modin equivalents for some pandas classes: https://github.com/modin-project/modin/issues/4236

You wrote about dask, modin has support for ray and dask: https://modin.readthedocs.io/en/stable/

pip install "modin[ray]" # Install Modin dependencies and Ray to run on Ray
pip install "modin[dask]" # Install Modin dependencies and Dask to run on Dask
pip install "modin[all]" # Install all of the above

PeterPirog avatar Feb 21 '22 07:02 PeterPirog

@PeterPirog great! Thank you.

solegalli avatar Feb 21 '22 12:02 solegalli