fix: raise UploadErrors for images and annotations
Description
Fixes issue #235 by raising UploadError exceptions that happens inside single_upload
Type of change
Please delete options that are not relevant.
- [X] Bug fix (non-breaking change which fixes an issue)
How has this change been tested, please provide a testcase or example of how you tested the change?
Same test as the issue
import json
import roboflow
rf = roboflow.Roboflow(api_key="API_KEY")
img_path = "<path_to_img>"
workspace_id = "<ws ID>"
project_id = "<proj ID>"
workspace = rf.workspace(workspace_id)
project = workspace.project(project_id)
annotation = {
"it is": [
{
"very frustrating": 0,
"when you": "send",
"an annotation": {
"file and": 0.5,
"function": 0.5,
"call": 0.5,
"completes": 0.5,
"successfully,": 0.5,
},
"but": "there's",
"no": "annotations",
"on": "the website",
"nor": "any",
"errors": "raised",
}
]
}
with open("annotation.json", "w") as f:
json.dump(annotation, f)
image = project.upload(img_path, annotation_path="annotation.json")
Any specific deployment considerations
For example, documentation changes, usability, usage/costs, secrets, etc.
Docs
- [ ] Docs updated? What were the changes:
Hey @caiquejjx 👋
Typically, we ask for a Colab to showcase the changes, which significantly speeds up the review process.
I made one for us this time: https://colab.research.google.com/drive/1xNixTC2xljopKhSpatDhYr2pCY0b5HvG#scrollTo=xI1cabXdatjr
I like the change, but wonder what the broader impact would be. Whether there's anyone relying on us failing silently.
Thanks @LinasKo! yeah it makes sense
Hi @caiquejjx, I'll be looking at this, and if you're around - let's get it merged!
I've tested the solution and it works well, but I'd like a slightly safer approach to parsing the error.
Here's what I have in mind:
def _parse_upload_error(self, error: rfapi.UploadError) -> str:
dict_part = str(error).split(": ", 2)[2]
# Could be done with ast.literal_eval, but this is safer.
if re.search(r"'\w+':", dict_part):
temp_str = dict_part.replace(r"\'", "<PLACEHOLDER>")
temp_str = temp_str.replace("'", '"')
dict_part = temp_str.replace("<PLACEHOLDER>", "'")
parsed_dict: dict = json.loads(dict_part)
message = parsed_dict.get("message")
return message or str(parsed_dict)
Could you replace the method (and import re instead of ast), and then test it?
Let's also add a correct annotation test, with annotation format such as this:
Annotations Dict
annotation = {
"info": {
"year": "2020",
"version": "1",
"description": "None",
"contributor": "Linas",
"url": "https://app.roboflow.ai/datasets/hard-hat-sample/1",
"date_created": "2000-01-01T00:00:00+00:00",
},
"licenses": [{"id": 1, "url": "https://creativecommons.org/publicdomain/zero/1.0/", "name": "Public Domain"}],
"categories": [
{"id": 0, "name": "cat", "supercategory": "animals"},
],
"images": [
{
"id": 0,
"license": 1,
"file_name": img_path,
"height": 1024,
"width": 1792,
"date_captured": "2020-07-20T19:39:26+00:00",
}
],
"annotations": [
{
"id": 0,
"image_id": 0,
"category_id": 0,
"bbox": [45, 2, 85, 85],
"area": 7225,
"segmentation": [],
"iscrowd": 0,
},
{
"id": 1,
"image_id": 0,
"category_id": 0,
"bbox": [324, 29, 72, 81],
"area": 5832,
"segmentation": [],
"iscrowd": 0,
},
],
}
Hey @LinasKo sure thing, I'll work on these soon and I guess we should also have automated tests right? (not sure if it was what you meant)
Aye, I wasn't too clear. A Colab will be fine this time - no need for unit tests.
@LinasKo done, I had to recreate the colab in order to edit
@tonylampada, could you please have a look at this? I believe it is ready to merge.
@LinasKo I think uploading with the CLI relies on the previous more lenient handling of annotation errors. Can you test that and see if it still has the same behaviour when importing datasets with the CLI?
Also, I think the backend already provides valid json in case of error, it just seems weird parsing it using regex like that.
@tonylampada, here's the repr of the error. It's not valid json - the strings are declared with '!.
UploadError("save annotation for MzqOjxfl22A5rxmFhrUI / bad response: 400: {'message': 'Image was already annotated.', 'type': 'InvalidImageException', 'hint': 'This image was already annotated; to overwrite the annotation, pass overwrite=true as a query parameter.'}")
I very much agree it's weird! The correct solution would be to change wherever the json is produced. Alternatively, let's use the whole string. Also, we can avoid all of this with ast.literal_eval, as per original suggestion by @caiquejjx, but that's a slight risk if the API error ever contains user-defined data.
CLI test results coming soon.
@tonylampada, I've tested:
- [x] Uploading image + annotation
- [x] Downloading object detection dataset (hard hats sample)
- [x] Importing hard hats into another obj detection project No issues during upload, no issues on the site.
However, I also labelled a keypoints dataset, created a version and attempted to download it, which failed. I think it's related to json, but unrelated to my changes.
Could you test if the following works?
roboflow download https://app.roboflow.com/linas-ws/keypooint-test/1 -l dataset-kp
Error response
❯ roboflow download https://app.roboflow.com/linas-ws/keypooint-test/1 -l dataset-kp
loading Roboflow workspace...
loading Roboflow project...
Traceback (most recent call last):
File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/requests/models.py", line 974, in json
return complexjson.loads(self.text, **kwargs)
File "/Users/linasko/.pyenv/versions/3.8.19/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/Users/linasko/.pyenv/versions/3.8.19/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Users/linasko/.pyenv/versions/3.8.19/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/core/version.py", line 258, in export
raise RuntimeError(response.json())
File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/requests/models.py", line 978, in json
raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/linasko/rf/pr/roboflow-python/.venv/bin/roboflow", line 8, in <module>
sys.exit(main())
File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/roboflowpy.py", line 423, in main
args.func(args)
File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/roboflowpy.py", line 45, in download
version.download(args.format, location=args.location, overwrite=True)
File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/core/version.py", line 205, in download
self.export(model_format)
File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/core/version.py", line 260, in export
response.raise_for_status()
File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://api.roboflow.com/linas-ws/keypooint-test/1/voc?api_key=<MY_API_KEY>
@tonylampada, responding to your Loom, I see the bigger picture now. We're both working towards the same goal of not hiding useful error messages.
The problem is that dataset import captures the result of project.single_upload and logs it with _log_img_upload, whereas uploading a single image/annotation completely ignores the result.
The correct solution would keep one of these approaches. Should we implement an equivalent of _log_img_upload in roboflow/roboflowpy.py::upload_image(), and then remove the raise introduced by this PR?
Aforementioned loom here just for reference (so we don't exclude @caiquejjx from the conversation :-)) https://www.loom.com/share/5f7a6520e0364b7588fd6130c682e0ef
@LinasKo here's what I'm thinking.
project.single_upload has an "identity crisis" because it's not exactly "single" upload, is it? :-)
It can upload:
- one image
- one annotation (I doubt this is much used, but it's there)
- one image, and then one annotation
Then there's the question of what do we do when something goes wrong with one of those operations. Ideally we should decouple those in different methods, but that would be a breaking change, probably not worth it. So, scratch that.
Then, when should single_upload throw an error? One possible answer is:
- one image = RAISE if fails
- one annotation = RAISE if fails
- one image, and then one annotation
- RAISE if image fails
- RETURN image id + annotation error if annotation fails
I think this kinda makes sense from the perspective of what error means. But I don't like it. Because it's a little more complex to implement inside (single_upload) and that complexity also leaks by forcing the caller to know when to handle and/or look inside the results.
So I'm tending to think that a better answer is:
- Always return, never raise. The return value is a dictionary that contains the result for each operation (like: "image upload=ok, here's the id, annotation upload failed, here's the error")
Now, duplicating _log_img_upload is something I don't like. DRY, right?
Maybe we should move that responsibility into single_upload as well, before returning the result dic. wdyt?
@tonylampada thanks for the loom! one note is that single_upload kinda did that before already but the problem was that the upload method didn't return anything and one of the reasons I went for raising is because people probably don't check the return of the upload since it doesn't returns anything, so besides the change on single_upload we should also start returning the result in upload, right? or maybe logging
we should also start returning the result in upload, right?
Yup. That makes sense
or maybe logging
I think it might be better to move the logging responsibility into single_upload.
I'm writing some tests for my PR (that maintains the compatibility with CLI), and this PR breaks the upload parse in some cases, I'll describe it better below.
In the line where the split is done by a colon and you access position 2, not all returns from the rfapi.upload_image have the status code, and so they don't have this second colon.
@caiquejjx @LinasKo @tonylampada
This PR should fix that, and another things:
#312