batdetect2 icon indicating copy to clipboard operation
batdetect2 copied to clipboard

Finetuning bug

Open vogelbam opened this issue 1 year ago • 5 comments

Hi, as mentioned in the readme, the finetuning code is (still) broken. In #19 you mentioned that you wanted to fix it soon. Is there any progress yet? When can we expect a fix?

vogelbam avatar Feb 10 '25 17:02 vogelbam

Also interested in a fix for the fine-tuning!

namitha-sh avatar Feb 11 '25 09:02 namitha-sh

Hi @vogelbam. Sorry for the delay in releasing the fine-tuning code. I am in the last stages of testing the new version so I expect to publish a beta version in a couple of weeks. Would be helpful to get some feedback on its use. Will post an update here as soon as I upload the beta version. Thanks for the patience!

mbsantiago avatar Feb 11 '25 17:02 mbsantiago

Hi, as a workaround you could use my fork with the branch oldfashioned_training: https://github.com/chrmue44/batdetect2/tree/oldfashioned_training

This is branched from an older version when training was still working. You can train your model with the old version and then use the resulting weights file with the newest version. It is still compatible.

Cheers Christian

chrmue44 avatar Feb 12 '25 20:02 chrmue44

Hi, i'm trying to use @chrmue44 forked repo to fine-tune the model and i'm encountering an error after some epochs, never at the same. The last part of the log error is:

......................
RuntimeError: DataLoader worker (pid 93789) is killed by signal: Aborted. 

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/Documenti/batdetect2-oldfashioned_training/bat_detect/finetune/finetune_model.py", line 161, in <module>
    train_loss = tm.train(model, epoch, train_loader, det_criterion, optimizer, scheduler, params)
  File "/home/user/Documenti/batdetect2-oldfashioned_training/bat_detect/finetune/../../bat_detect/train/train_model.py", line 89, in train
    for batch_idx, inputs in enumerate(data_loader):
  File "/home/user/miniconda3/envs/batdetect2/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 628, in __next__
    data = self._next_data()
  File "/home/user/miniconda3/envs/batdetect2/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1316, in _next_data
    idx, data = self._get_data()
  File "/home/user/miniconda3/envs/batdetect2/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1272, in _get_data
    success, data = self._try_get_data()
  File "/home/user/miniconda3/envs/batdetect2/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1133, in _try_get_data
    raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 93789) exited unexpectedly

Has someone ever encountered this type of error when fine-tuning/train form scratch? It happens in both cases. I've tried also on another pc and it gives the same error.

Thanks in advance

EDIT: using params['num_workers'] = 0 it works without errors

lollogiro avatar Feb 13 '25 16:02 lollogiro

Hi lollogiro, I remember that I had similar problems. Most likely something is wrong with your annotation data. As far as I remember frequencies (high_freq, low_freq) that are outside the frequency range lead to similar error.

Cheers Christian

chrmue44 avatar Feb 13 '25 20:02 chrmue44