Running train.py fails because species is not part of demo dataset
Hi, first congrats to the two papers "Will artificial intelligence revolutionize aerial surveys? A first large-scale semi-automated survey of African wildlife using oblique imagery and deep learning" and "From crowd to herd counting: How to precisely detect and count African mammals using aerial imagery and deep learning?". I am currently involved in a project trying to count marine Iguanas in the Galapagos islands and try to work through the current research. I wanted to get your code to work and ran into a couple of issues and fixed some of them. First the url of the pretrained dla model is not accessible anymore. I found a replacement which works fine: https://github.com/cwinkelmann/HerdNet/blob/initial_run/animaloc/models/dla.py#L35
Out of optimism I wanted to start with python 3.10 and had to change the dependencies, which are https://github.com/cwinkelmann/HerdNet/blob/initial_run/requirements_310_linux.txt
With that I ran the demo notebook which finished training fine, but inference didn’t succeed because of the changes in the config, which refer to this https://github.com/cwinkelmann/HerdNet/blame/296d2e14dece507431edbf9ed5c64166bbc9311d/configs/train/herdnet.yaml#L45 . The infer scripts wants to read the classes from the checkpoint which are not persisted in the demo notebook, only the train script ( https://github.com/cwinkelmann/HerdNet/blob/296d2e14dece507431edbf9ed5c64166bbc9311d/tools/infer.py#L80 ) but I can’t run the train because of issues with the “labels”/”species” column. It occurred to me the demo dataset doesn’t match the code, because species isn’t part of the dataset and labels are already in there.
I have some quickfixing ideas in my head (changing the column to species obviously doesn’t work, it fails here: https://github.com/cwinkelmann/HerdNet/blob/296d2e14dece507431edbf9ed5c64166bbc9311d/animaloc/data/annotations.py#L268 ). Do you have good solution in mind? It seems there are 6 classes, which include at least elephants for class id 6 but I am not sure about the others.
Best Christian
Hi @cwinkelmann,
Interesting project, hope HerdNet can help! Thanks for reporting this issue. I should definitively update the demo notebook, have just put it in my to-do list!
If you want to run the train.py tool, you need to have the 'species' (name of the species, str) and 'labels' (corresponding id, int) columns. Once your model has completed training, a match (see example below) will be automatically stored in your PTH file.
Here is the matching dict for the demo dataset:
class_dict = {
1: "topi",
2: "buffalo",
3: "kob",
4: "warthog",
5: "waterbuck",
6: "elephant"
}
If you add the species column accordingly to the demo's CSV files, it should work!
Let me know!
Best,
Alexandre
Hi @Alexandre-Delplanque, thank you for the answer. I managed to run the tools/train.py script in the end by how you described it. I add the column "species" by mapping the already existing label ids to the dictionary ( https://github.com/cwinkelmann/HerdNet/blob/2db91c304cd34e1430f2ef9b6c52959925b00aa6/notebooks/run.py#L15 ). Some more changed where necessary, like setting the wandb entity to "null"
Unfortunately the persisting of "classes", "mean" and "std" as implemented here (https://github.com/cwinkelmann/HerdNet/blob/2db91c304cd34e1430f2ef9b6c52959925b00aa6/tools/train.py#L399) was not successful, so I quickfixed it again by hardcoding it:
https://github.com/cwinkelmann/HerdNet/blob/2db91c304cd34e1430f2ef9b6c52959925b00aa6/tools/infer.py#L84
I will double check if I did something wrong there. I will first try running the training with my iguanas.
When I am done with that I can have a look at updating the demo notebook and give you a pull request in the near future.
Regards Christian
Hi @cwinkelmann,
Good news! Note that the wandb entity should be set to your Weights & Biases' username.
That's odd, have you completed the training session launched using the train.py tool? At the end of training, 'mean', 'std' and 'classes' should have been updated in the .pth file (L.394-400).
If you want to update it manually, you might use the following code snippet, which already appears in the README (see here):
import torch
pth_file = torch.load('path/to/the/file.pth')
pth_file['classes'] = {1:'species_1', 2:'species_2', ...}
pth_file['mean'] = [0.485, 0.456, 0.406]
pth_file['std'] = [0.229, 0.224, 0.225]
torch.save(pth_file, 'path/to/the/file.pth')
Have a look at the short doc for writing / updating the configs files!
Hope this helps!
All the best,
Alexandre
Hi @cwinkelmann,
Did you familiarise yourself with the code?
Is it still an issue?
Best,
Alexandre
Hi @Alexandre-Delplanque, First of all congrats for your defense of your phd thesis. It is a joy to read and your work is big pillar in mine. I spent a lot of time with deepforest which struggles with false positives.
Yes I am good for now, thank you. I haven't managed to start a little refactoring yet. The aforementioned persisting of std, mean still failed because those are supposed to be fetched from the transformations Normalize, but I added the some after Normalise which made the retrieving the values impossible.
Best Christian