swifter icon indicating copy to clipboard operation
swifter copied to clipboard

Can't improve speed as described in my case

Open WoNiuHu opened this issue 3 years ago • 2 comments

`import pandas as pd import numpy as np import time import swifter

cat_colu = ['A', 'B', 'C', 'D', 'E'] df=pd.DataFrame(np.random.rand(1000000,5), columns=['A', 'B', 'C', 'D', 'E']) t1 = time.time() for col in cat_colu: res = df[col].apply(lambda x: 1 if float(x) >= 0.5 else 0) t2 = time.time() print("apply cost:", t2-t1) t3 = time.time() for col in cat_colu: res_parallel2 = df[col].swifter.progress_bar(False).apply(lambda x: 1 if float(x) >= 0.5 else 0) t4 = time.time() print("swifter cost is:", t4-t3) print("aaaa: ", res.equals(res_parallel2))`

When I use this demo to measure the speed, I found that when the row number is less than 500,000, the speed is not as fast as apply, but when the row number is greater than this number, an error will be reported, which is very strange, I hope the author can help answer it;

The errors like this

` tmp_df = func(sample, *args, **kwds) File "/workspace/personal/test_pandas_v2.py", line 30, in tmp_df = func(sample, *args, **kwds) File "/workspace/personal/test_pandas_v2.py", line 30, in tmp_df = func(sample, *args, **kwds) File "/workspace/personal/test_pandas_v2.py", line 30, in raise TypeError(f"cannot convert the series to {converter}") TypeError: cannot convert the series to <class 'float'>

During handling of the above exception, another exception occurred: `

WoNiuHu avatar Jul 20 '22 03:07 WoNiuHu

Hi @WoNiuHu ,

Thanks for your code example. What version of swifter are you using?

I ran your code example and a didn't encounter the TypeError that you did. That TypeError should be handled, but it appears you got another exception. Can you also provide the full stack trace for the error you encountered?

Thanks!

jmcarpenter2 avatar Jul 20 '22 15:07 jmcarpenter2

As a side note, your function will run instantaneously if you refactor it to use np.where, e.g.:

df[col].swifter.apply(lambda x: np.where(x >= 0.5, 1, 0))

this is because swifter will automatically vectorize the function if you do that.

jmcarpenter2 avatar Jul 20 '22 15:07 jmcarpenter2

Closing due to inactivity

jmcarpenter2 avatar Aug 28 '22 01:08 jmcarpenter2