modin icon indicating copy to clipboard operation
modin copied to clipboard

silence warnings completely

Open cristianmtr opened this issue 3 years ago • 7 comments

When running this I get a lot of warnings which makes it hard to read my own output

It's warnings along the lines of

(pid=22798) UserWarning: Distributing <class 'numpy.ndarray'> object. This may take some time.

I have tried

import warnings
warnings.filterwarnings("ignore")

but that didn't work.

Is there any parameter or env var which could silence all these?

Thanks

cristianmtr avatar Jun 23 '22 10:06 cristianmtr

@cristianmtr warnings.filterwarnings("ignore") should silence all Modin warnings in the main thread. It looks like the warning you showed is coming from a Ray task, not from the main thread. If I filter warnings in the main thread and create a Modin dataframe in a Ray task, I get a warning like yours: (create_dataframe pid=58509) UserWarning: Distributing <class 'numpy.ndarray'> object. This may take some time.

import modin.pandas as pd
import numpy as np
import warnings
import ray

@ray.remote
def create_dataframe():
    return pd.DataFrame(np.array([1]))

warnings.filterwarnings('ignore')
print(ray.get(create_dataframe.remote()))

If I instead filter warnings within the ray worker, I no longer get the warning:

import modin.pandas as pd
import numpy as np
import warnings
import ray

@ray.remote
def create_dataframe():
    warnings.filterwarnings('ignore')
    return pd.DataFrame(np.array([1]))

print(ray.get(create_dataframe.remote()))

I can also avoid the warning if I create the dataframe in the main process:

import modin.pandas as pd
import numpy as np
import warnings

warnings.filterwarnings('ignore')
print(pd.DataFrame(np.array([1])))

Generally, you should let Modin run Ray tasks on its own and not construct Modin objects within ray tasks. Are you doing the latter? Or is something in Modin creating Modin objects within ray tasks on its own? Do you have a complete script that reproduces the bug?

mvashishtha avatar Jun 23 '22 14:06 mvashishtha

I am not starting it like you mentioned

@ray.remote
def create_dataframe():
    warnings.filterwarnings('ignore')
    return pd.DataFrame(np.array([1]))

I am just creating the df and doing apply on it.

I cannot share the full code, as it is closed source.

but the basic gist is

out_df = df.swifter.apply(some_func, axis=1, result_type="expand")

cristianmtr avatar Jul 12 '22 11:07 cristianmtr

@cristianmtr thank you! I suspect that the function you are applying (some_func above) is creating a Modin dataframe. Then Modin would create a Modin dataframe in a remote task, which wouldn't get the filterwarnings. We'd then get the original warning like (pid=22798) UserWarning: Distributing <class 'numpy.ndarray'> object. This may take some time., which is from constructing a Modin dataframe. Is it possible that your apply func is creating a modin dataframe?

Generally, the function you pass to DataFrame.apply should deal with pandas dataframes and series only. You shouldn't suffer any performance losses if you use pandas objects within the function, because Modin will apply the function in parallel on partitions of the whole frame.

Admittedly, having to pass pandas functions into Modin is unintuitive. I pointed this problem out here.

mvashishtha avatar Jul 13 '22 00:07 mvashishtha

@mvashishtha Thanks for the clarification. Yes, from digging in the code, I can see that we are creating another df inside the apply. It's quite a complex bit of code, so we can't really refactor it easily.

There's no way to silence these warnings across all modin processes?

cristianmtr avatar Jul 13 '22 07:07 cristianmtr

@vnlitvinov I know that you replaced a run_function_on_all_workers call in #4603. Is there some way we could run a filterwarnings line on all ray workers?

mvashishtha avatar Jul 14 '22 14:07 mvashishtha

I don't think there's a reliable way for us to do this. However, one can try initializing Ray themselves and use Ray options to (hopefully) silence any workers' output: https://github.com/ray-project/ray/issues/5048

vnlitvinov avatar Jul 14 '22 20:07 vnlitvinov

@cristianmtr the solution from the thread linked by @vnlitvinov is to do silence all ray logging by passing log_to_driver=False to ray.init. Note that you will have to initialize the ray cluster yourself instead of letting Modin do that for you. Will silencing all log output in that way work for you?

By the way, note that if you are initialzing ray yourself, it's highly recommended to give the ray object store something like 60% of available system memory, as Modin does. The default ray object store size is typically not enough.

mvashishtha avatar Jul 15 '22 15:07 mvashishtha

@mvashishtha is there anything more we can do from our end? I can think of trying to add an easily configurable option with modin.config that will suppress warnings at the Ray level, but I'm not sure if that's something we want to go with.

pyrito avatar Aug 31 '22 20:08 pyrito

@pyrito I think we shouldn't have an optional setting that's just a wrapper around a ray init option (log_to_driver=False). I think we should close this issue for now. If we hear that more users want to suppress warnings, we can consider either always setting log_to_driver=False or adding a section about that option in the documentation.

P.S. We may end up setting log_to_driver=False anyway, depending on what happens in https://github.com/ray-project/ray/issues/28216. It might be the way to get rid of ray error spam that I find really annoying when using Modin.

mvashishtha avatar Aug 31 '22 21:08 mvashishtha