Imagenet-C loader can only return 5k images
When loading imagenetc with a batchsize of over 5k images, you always get 5k images back. This doesn't throw out an error, and it can be confusing when you expect to receive more images than you actually get.
This behavior can be shown using this code snippet:
from robustbench.data import load_imagenetc
x_test, y_test = load_imagenetc(50000, 5, path, False, ['brightness'])
print(x_test.size())
Hi,
sorry for the late reply. At the moment it is indeed possible to load the 5k images for which the results are reported. I've added an error if more images are specified with https://github.com/RobustBench/robustbench/pull/96.
For common corruptions, it might make sense to add the option of running the evaluation of the whole validation set. I think this would require some adjustment of the code, since at the moment all examples are loaded at once in https://github.com/RobustBench/robustbench/blob/1b632c5fa3d86ab807297b3ae1063dad949e6c0d/robustbench/data.py#L207 @max-andr @dedeswim Thoughts?
@fra31: agreed, throwing an error sounds good as a temporary solution. And I think it's important to preserve backward compatibility as returning directly tensors has been useful to simplify code around RobustBench and now used in many scripts.
As for a better solution that preserves backward compatibility, we could create an optional parameter for each load_* function in data.py which would make it return either tensors (i.e., x_test, y_test as it is now) or a loader. Perhaps, then we should also make functions like clean_accuracy() also compatible with data supplied via a loader.
What do you think?
I agree, we should preserve the current loading as default and adding the option of using the full validation set for common corruptions evaluation.
Hi! I agree to add optional support for DataLoaders. I can open a PR and work on it
@dedeswim that would be fantastic!