Cannot use `cachier` with `tqdm.contrib.concurrent.process_map` or `thread_map`
Hi,
When using @cachier with tqdm utility for multiprocessing or multithreading, cachier always get stuck in what seems to be an infinite loop.
I tried playing with the decorator args, but without success.
Could you please share a simplified code example? :)
Hi, back with some code example:
With pandarallel (pandas in parallel):
"""Test pandarallel with cachier."""
from cachier import cachier
import pandas as pd
from pandarallel import pandarallel
@cachier(stale_after=86400)
def _worker(x):
return x + 1
def worker(x):
return _worker(x)
def main():
"""Main function."""
df = pd.DataFrame({"x": range(100)})
pandarallel.initialize(progress_bar=True)
df["y"] = df["x"].parallel_apply(worker)
print(df)
if __name__ == "__main__":
# _worker.clear_cache()
main()
The first time, it runs fine. But if I relaunch the script just after (hopefully with the cache), I get error: EDIT : THIS DOESN'T SEEM TO BE AN ERROR CAUSED BY CACHIER.
If I replace df["x"].parallel_apply(worker) by df["x"].parallel_apply(_worker), it does not progress at all (progressbar blocked).
With tqdm.process_map now:
"""Test tqdm.process_map with cachier."""
from cachier import cachier
from tqdm.contrib.concurrent import process_map
@cachier(stale_after=86400)
def _worker(x):
return x + 1
def worker(x):
return _worker(x)
def main():
"""Main function."""
data = list(range(100))
result = process_map(worker, data, max_workers=4)
print(result)
if __name__ == "__main__":
# _worker.clear_cache()
main()
First time, it runs fine.
Second time (hopefully use cachier), progressbar is blocked: 0%| | 0/100 [00:00<?, ?it/s]