pandarallel icon indicating copy to clipboard operation
pandarallel copied to clipboard

Limit number of progress bars on larger multi-core systems

Open applio opened this issue 2 years ago • 3 comments

Currently, pandarallel dutifully creates one progress bar for each and every worker but on multi-core systems with a large-ish number of cores (say 128 or more) seeing so many progress bars can be overwhelming. In these situations, it may prove more valuable to display a smaller number of progress bars (not necessarily one overall) with each worker mapped to one of the displayed progress bars.

What is proposed:

  1. For N workers, offer the option to display M progress bars where N >= M and each worker contributes to one progress bar (i.e. worker n contributes to progress bar m such that m = (n % M)).
  2. If an error occurs during execution, that worker's progress bar (which may represent progress from multiple workers) will indicate an error occurred, matching current functionality.
  3. Keep the existing default behavior unchanged so that not specifying a maximum number of progress bars to display results in as many progress bars as workers.

Additional motivation: We have successfully used pandarallel on systems with a much larger number of cores than 128 where seeing as many progress bars as workers is genuinely problematic. We very much benefit from and do not want to simply disable the progress bars -- we want to monitor the progress of our parallel_apply() and parallel_map() operations in a digestible way and without flooding the screen / notebook with too much information.

Proposed implementation: A working implementation has been prepared along with unittests -- a pull request will be added to this issue.

applio avatar Jun 08 '23 19:06 applio

Example of the code from PR #243 in use in a Jupyter notebook: pandarallel_jupyter_notebook_run_finished

applio avatar Jun 08 '23 19:06 applio

Example of the code from PR #243 in use in the IPython console: pandarallel_ipython_console_run_underway

Of the 20 workers, each gets 5M rows from a 100M row DataFrame. Because 20 is not divisble by 3, the first 2 progress bars each represent 7 workers and the last 1 progress bar represents 6 workers.

applio avatar Jun 08 '23 19:06 applio

Pandaral·lel is looking for a maintainer! If you are interested, please open an GitHub issue.

nalepae avatar Jan 23 '24 09:01 nalepae