voc_format: don't create XML files with no objects on export
Summary
The original VOC datasets (at least the 2007 and 2012 versions that I have checked) do not contain XML files with no defined objects. Where an image is not annotated with objects, the corresponding XML file is simply omitted.
This change makes the Datumaro VOC converter consistent with that convention by avoiding dumping the XML files when there are no objects associated with a dataset item.
Beside consistency, the other benefit of this is that written datasets can be disambiguated from the LabelMe format by the format detection machinery. The VOC and LabelMe formats seem to only differ by how annotations are represented inside the object element, so if there are no objects, the resulting XML file is ambiguous. By avoiding writing such XML files, we
avoid the ambiguity.
Fixes #658.
How to test
Checklist
- [x] I submit my changes into the
developbranch - [ ] I have added description of my changes into CHANGELOG
- [ ] I have updated the documentation accordingly
- [ ] I have added tests to cover my changes
- [x] I have linked related issues
License
- [x] I submit my code changes under the same MIT License that covers the project. Feel free to contact the maintainers if that's a concern.
- [ ] I have updated the license header for each file (see an example below)
# Copyright (C) 2021 Intel Corporation
#
# SPDX-License-Identifier: MIT
@zhiltsov-max I haven't finished this (need to write tests still), but could you let me know if the solution seems reasonable to you? Maybe I missed some reason why empty XML files are required.
Note that the actual solution is in the last commit; the rest are just refactoring.
As far as I remember, the idea was to support the test subset and to export empty image metainfo from CVAT. There should even be an option or a check in the converter.
As far as I remember, the idea was to support the test subset
I don't think it's necessary to write object-less XML files to support the test subset - after all, the test subset in the original dataset doesn't include them.
and to export empty image metainfo from CVAT.
What does this mean?
There should even be an option or a check in the converter.
I see an option to write empty subset lists, but that seems unrelated.
What does this mean?
It means that users often asked to export unannotated images from CVAT along with annotated ones. For VOC, in case when we don't export images, it allows to keep image size info.
For VOC, in case when we don't export images, it allows to keep image size info.
I suppose so, but what use is the image size without the image itself (especially if there are no annotations either)?
I suppose so, but what use is the image size without the image itself (especially if there are no annotations either)?
The reasons can be different, but I don't think it's relevant to the format detection question. Speaking about the solution, maybe we could check another file or return a lower confidence value, if we see there are VOC directories in the dataset? Or maybe, we could return higher confidence for VOC.
The reasons can be different, but I don't think it's relevant to the format detection question.
It seems pretty relevant. If there is no actual use case for empty XML files, we could stop writing them, making the question moot.
Speaking about the solution, maybe we could check another file
I could change the LabelMe detector to probe every XML file until it found one that contained at least one object, and that would solve #658, but it feels unsatisfactory as a general solution. If the user exports a dataset with no annotations in the VOC format, that would still be ambiguous even with this tweak.
I think I will give this solution a go, because it seems useful in its own right; however, I don't think it makes the change in this PR redundant.
or return a lower confidence value, if we see there are VOC directories in the dataset?
I don't think that would be right. A dataset could be genuinely ambiguous (containing files from both formats), in which case we should report that.
Or maybe, we could return higher confidence for VOC.
I wish, but I don't think the format warrants higher confidence. All we can rely on to detect it are txt files in fairly generically-named directories, and I could see another format having the same structure.
I suggest to look at confidence levels and define them the following way:
- High (exactly the dataset):
- there are only expected files (secondary can be missing, labelmap and meta should be considered normal)
- they have right names and placement
- their formats match the expected format (mandatory fields present, no extra fields, but it is negotiable. Just for upper-level fields)
- Medium (looks like a custom dataset with extra info):
- there are extra files in the directory (subformats must use the detection of the whole dataset)
- required file formats look like expected (mandatory fields present)
- Low (can be parsed):
- there are extra files or directories (non-system ones)
- required files present
- their formats look like expected (mandatory fields present)
This way we're going to have High for VOC and Medium (because there are extra files) or even Low (because of extra directories) for LabelMe.