[Feature]: File conversion progress bars
What would you like to see added to the NWB GUIDE?
Next step after https://github.com/NeurodataWithoutBorders/nwb-guide/pull/676 is to do parallel file write and sub bars for the buffers of each file
Here is some basic code that should accomplish this (I'll have to open a small PR to HDMF and HDMF-Zarr to facilitate proper progress bar class specification)
futures = []
with ProcessPoolExecutor(max_workers=max_workers) as executor:
for session_to_nwb_kwargs in session_to_nwb_kwargs_per_session:
# Errors in subprocesses will not propagate to top level stderr
# So dump them to a file instead
# Might want dedicated subfolder for this?
exception_file_path = data_dir_path / f"ERROR_<nwbfile_name>.txt"
# This is where we inject sub par progress info
# This is passed via the iterator options of relevant interfaces through conversion options of the converter
# For example
# conversion_options = dict()
# for interface_key, interface in data_formats_page.items():
# if isinstance(interface, BaseRecordingInterface):
# conversion_options.update({interface_key: dict(iterator_opts=dict(display_progress=True, progress_bar_class=TQDMPublisher, progress_bar_options=dict(...)))})
futures.append(
executor.submit(
safe_session_to_nwb,
session_to_nwb_kwargs=session_to_nwb_kwargs,
exception_file_path=exception_file_path,
)
)
for _ in TQDMPublisher(as_completed(futures), total=len(futures), ...):
pass
where safe_session_to_nwb looks like this
def safe_session_to_nwb(*, session_to_nwb_kwargs: dict, exception_file_path: Union[Path, str]):
"""Convert a session to NWB while handling any errors by recording error messages to the exception_file_path.
Parameters
----------
session_to_nwb_kwargs : dict
The arguments for session_to_nwb.
exception_file_path : Path
The path to the file where the exception messages will be saved.
"""
exception_file_path = Path(exception_file_path)
try:
session_to_nwb(**session_to_nwb_kwargs)
except Exception as e:
with open(exception_file_path, mode="w") as f:
f.write(f"session_to_nwb_kwargs: \n {pformat(session_to_nwb_kwargs)}\n\n")
f.write(traceback.format_exc())
where session_to_nwb is whatever current function we currently use in the GUIDE to convert a single session to create a single file
Inspired by @pauladkisson's contribution on https://github.com/catalystneuro/cookiecutter-my-lab-to-nwb-template/pull/23, which is based on strategies we've used many times for past conversions
Do you have any interest in helping implement the feature?
Yes.
Code of Conduct
Yes
Did you confirm this feature was not already reported?
Yes
This branch should work for HDF5 at least: https://github.com/hdmf-dev/hdmf/pull/1110
I don't think NeuroConv changes should be needed since they just dynamically pass everything down the chain
Just flagging that adding Paul's safe conversion strategy to https://github.com/NeurodataWithoutBorders/nwb-guide/pull/778 would change a good amount of our error handling for the GUIDE in general, and might be better in a separate Issue / PR
We can probably just use our own existing endpoint for per-file conversions; the main goal with any kind of 'safe' way of doing that is just that the traceback error stack gets dumped to a persistent file on disk since stdout/stderr pipes are not easily accessible during multiprocessing
Just implemented a general log system in https://github.com/NeurodataWithoutBorders/nwb-guide/pull/778 with a specific endpoint for registering errors from parallel processes! Should give us something useful to work with generally for remote debugging :)
Here's an example file: 2024-05-15_16-03-22.log
As shown by the Exception on /neuroconv/convert line, this will automatically log errors from any failing endpoint in addition to the process-specific errors
Just implemented a general log system in https://github.com/NeurodataWithoutBorders/nwb-guide/pull/778 with a specific endpoint for registering errors from parallel processes! Should give us something useful to work with generally for remote debugging :)
Very cool!
Added in #778