fsdp_qlora
fsdp_qlora copied to clipboard
fix multiprocess issue (RuntimeError An attempt has been made to start a new process before the current process has finished its bootstrapping phase)
Fix RuntimeError in https://github.com/AnswerDotAI/fsdp_qlora/issues/28 once and for all by protecting main code with if __name__ != '__main__': return
Background:
- The original issue occurred when the
awqpackage was imported intrain.pyoutside of a function. This was due topeftimportingawqif installed, causing problems for users who had autoawq installed (me). The previous PR #30 solved by moving the peft imports inside the functions where the modules are needed. - Yesterday @iseesaw reported that he ran into this error too https://github.com/AnswerDotAI/fsdp_qlora/pull/30#issuecomment-2074177115. Did some digging and found that the
multiprocesspackage is causing the problem. It appears that importing multiprocess changes the way child processes are created, similar to the behavior on Windows (see PyTorch documentation). - Although the current
train.pydoes not import multiprocess directly, the HuggingFacedatasetspackage does. If a user modifies the code to import datasets, for example withfrom datasets import load_dataset, the error occurs. - To fix this, this PR protects the main code in
train.pywithif __name__ != '__main__': return, similar to how it was recommended in the error message.
Error:
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
This change should prevent the RuntimeError from occurring, regardless of whether the user imports packages that affect process creation.