Can't improve speed as described in my case
`import pandas as pd import numpy as np import time import swifter
cat_colu = ['A', 'B', 'C', 'D', 'E'] df=pd.DataFrame(np.random.rand(1000000,5), columns=['A', 'B', 'C', 'D', 'E']) t1 = time.time() for col in cat_colu: res = df[col].apply(lambda x: 1 if float(x) >= 0.5 else 0) t2 = time.time() print("apply cost:", t2-t1) t3 = time.time() for col in cat_colu: res_parallel2 = df[col].swifter.progress_bar(False).apply(lambda x: 1 if float(x) >= 0.5 else 0) t4 = time.time() print("swifter cost is:", t4-t3) print("aaaa: ", res.equals(res_parallel2))`
When I use this demo to measure the speed, I found that when the row number is less than 500,000, the speed is not as fast as apply, but when the row number is greater than this number, an error will be reported, which is very strange, I hope the author can help answer it;
The errors like this
` tmp_df = func(sample, *args, **kwds)
File "/workspace/personal/test_pandas_v2.py", line 30, in
During handling of the above exception, another exception occurred: `
Hi @WoNiuHu ,
Thanks for your code example. What version of swifter are you using?
I ran your code example and a didn't encounter the TypeError that you did. That TypeError should be handled, but it appears you got another exception. Can you also provide the full stack trace for the error you encountered?
Thanks!
As a side note, your function will run instantaneously if you refactor it to use np.where, e.g.:
df[col].swifter.apply(lambda x: np.where(x >= 0.5, 1, 0))
this is because swifter will automatically vectorize the function if you do that.
Closing due to inactivity