Finetuning bug
Hi, as mentioned in the readme, the finetuning code is (still) broken. In #19 you mentioned that you wanted to fix it soon. Is there any progress yet? When can we expect a fix?
Also interested in a fix for the fine-tuning!
Hi @vogelbam. Sorry for the delay in releasing the fine-tuning code. I am in the last stages of testing the new version so I expect to publish a beta version in a couple of weeks. Would be helpful to get some feedback on its use. Will post an update here as soon as I upload the beta version. Thanks for the patience!
Hi, as a workaround you could use my fork with the branch oldfashioned_training: https://github.com/chrmue44/batdetect2/tree/oldfashioned_training
This is branched from an older version when training was still working. You can train your model with the old version and then use the resulting weights file with the newest version. It is still compatible.
Cheers Christian
Hi, i'm trying to use @chrmue44 forked repo to fine-tune the model and i'm encountering an error after some epochs, never at the same. The last part of the log error is:
......................
RuntimeError: DataLoader worker (pid 93789) is killed by signal: Aborted.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/user/Documenti/batdetect2-oldfashioned_training/bat_detect/finetune/finetune_model.py", line 161, in <module>
train_loss = tm.train(model, epoch, train_loader, det_criterion, optimizer, scheduler, params)
File "/home/user/Documenti/batdetect2-oldfashioned_training/bat_detect/finetune/../../bat_detect/train/train_model.py", line 89, in train
for batch_idx, inputs in enumerate(data_loader):
File "/home/user/miniconda3/envs/batdetect2/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 628, in __next__
data = self._next_data()
File "/home/user/miniconda3/envs/batdetect2/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1316, in _next_data
idx, data = self._get_data()
File "/home/user/miniconda3/envs/batdetect2/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1272, in _get_data
success, data = self._try_get_data()
File "/home/user/miniconda3/envs/batdetect2/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1133, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 93789) exited unexpectedly
Has someone ever encountered this type of error when fine-tuning/train form scratch? It happens in both cases. I've tried also on another pc and it gives the same error.
Thanks in advance
EDIT:
using params['num_workers'] = 0 it works without errors
Hi lollogiro, I remember that I had similar problems. Most likely something is wrong with your annotation data. As far as I remember frequencies (high_freq, low_freq) that are outside the frequency range lead to similar error.
Cheers Christian