roboflow-python icon indicating copy to clipboard operation
roboflow-python copied to clipboard

fix: raise UploadErrors for images and annotations

Open caiquejjx opened this issue 1 year ago • 3 comments

Description

Fixes issue #235 by raising UploadError exceptions that happens inside single_upload

Type of change

Please delete options that are not relevant.

  • [X] Bug fix (non-breaking change which fixes an issue)

How has this change been tested, please provide a testcase or example of how you tested the change?

Same test as the issue

import json
import roboflow

rf = roboflow.Roboflow(api_key="API_KEY")
img_path = "<path_to_img>"

workspace_id = "<ws ID>"
project_id = "<proj ID>"
workspace = rf.workspace(workspace_id)
project = workspace.project(project_id)

annotation = {
    "it is": [
        {
            "very frustrating": 0,
            "when you": "send",
            "an annotation": {
                "file and": 0.5,
                
                "function": 0.5,
                "call": 0.5,
                "completes": 0.5,
                "successfully,": 0.5,

            },
            "but": "there's",
            "no": "annotations",
            "on": "the website",
            "nor": "any",
            "errors": "raised",
        }
    ]
}
with open("annotation.json", "w") as f:
    json.dump(annotation, f)

image = project.upload(img_path, annotation_path="annotation.json")

Any specific deployment considerations

For example, documentation changes, usability, usage/costs, secrets, etc.

Docs

  • [ ] Docs updated? What were the changes:

caiquejjx avatar Jun 23 '24 20:06 caiquejjx

Hey @caiquejjx 👋

Typically, we ask for a Colab to showcase the changes, which significantly speeds up the review process.

I made one for us this time: https://colab.research.google.com/drive/1xNixTC2xljopKhSpatDhYr2pCY0b5HvG#scrollTo=xI1cabXdatjr

I like the change, but wonder what the broader impact would be. Whether there's anyone relying on us failing silently.

LinasKo avatar Jun 26 '24 07:06 LinasKo

Thanks @LinasKo! yeah it makes sense

caiquejjx avatar Jun 26 '24 13:06 caiquejjx

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jul 12 '24 14:07 CLAassistant

Hi @caiquejjx, I'll be looking at this, and if you're around - let's get it merged!

I've tested the solution and it works well, but I'd like a slightly safer approach to parsing the error.

Here's what I have in mind:

    def _parse_upload_error(self, error: rfapi.UploadError) -> str:
        dict_part = str(error).split(": ", 2)[2]
        # Could be done with ast.literal_eval, but this is safer.
        if re.search(r"'\w+':", dict_part):
            temp_str = dict_part.replace(r"\'", "<PLACEHOLDER>")
            temp_str = temp_str.replace("'", '"')
            dict_part = temp_str.replace("<PLACEHOLDER>", "'")
        parsed_dict: dict = json.loads(dict_part)
        message = parsed_dict.get("message")
        return message or str(parsed_dict)

Could you replace the method (and import re instead of ast), and then test it?

Let's also add a correct annotation test, with annotation format such as this:

Annotations Dict
annotation = {
    "info": {
        "year": "2020",
        "version": "1",
        "description": "None",
        "contributor": "Linas",
        "url": "https://app.roboflow.ai/datasets/hard-hat-sample/1",
        "date_created": "2000-01-01T00:00:00+00:00",
    },
    "licenses": [{"id": 1, "url": "https://creativecommons.org/publicdomain/zero/1.0/", "name": "Public Domain"}],
    "categories": [
        {"id": 0, "name": "cat", "supercategory": "animals"},
    ],
    "images": [
        {
            "id": 0,
            "license": 1,
            "file_name": img_path,
            "height": 1024,
            "width": 1792,
            "date_captured": "2020-07-20T19:39:26+00:00",
        }
    ],
    "annotations": [
        {
            "id": 0,
            "image_id": 0,
            "category_id": 0,
            "bbox": [45, 2, 85, 85],
            "area": 7225,
            "segmentation": [],
            "iscrowd": 0,
        },
        {
            "id": 1,
            "image_id": 0,
            "category_id": 0,
            "bbox": [324, 29, 72, 81],
            "area": 5832,
            "segmentation": [],
            "iscrowd": 0,
        },
    ],
}

LinasKo avatar Aug 13 '24 09:08 LinasKo

Hey @LinasKo sure thing, I'll work on these soon and I guess we should also have automated tests right? (not sure if it was what you meant)

caiquejjx avatar Aug 13 '24 13:08 caiquejjx

Aye, I wasn't too clear. A Colab will be fine this time - no need for unit tests.

LinasKo avatar Aug 13 '24 13:08 LinasKo

@LinasKo done, I had to recreate the colab in order to edit

caiquejjx avatar Aug 14 '24 00:08 caiquejjx

@tonylampada, could you please have a look at this? I believe it is ready to merge.

LinasKo avatar Aug 15 '24 08:08 LinasKo

@LinasKo I think uploading with the CLI relies on the previous more lenient handling of annotation errors. Can you test that and see if it still has the same behaviour when importing datasets with the CLI?

Also, I think the backend already provides valid json in case of error, it just seems weird parsing it using regex like that.

tonylampada avatar Aug 15 '24 15:08 tonylampada

@tonylampada, here's the repr of the error. It's not valid json - the strings are declared with '!.

UploadError("save annotation for MzqOjxfl22A5rxmFhrUI / bad response: 400: {'message': 'Image was already annotated.', 'type': 'InvalidImageException', 'hint': 'This image was already annotated; to overwrite the annotation, pass overwrite=true as a query parameter.'}")

I very much agree it's weird! The correct solution would be to change wherever the json is produced. Alternatively, let's use the whole string. Also, we can avoid all of this with ast.literal_eval, as per original suggestion by @caiquejjx, but that's a slight risk if the API error ever contains user-defined data.

CLI test results coming soon.

LinasKo avatar Aug 15 '24 16:08 LinasKo

@tonylampada, I've tested:

  • [x] Uploading image + annotation
  • [x] Downloading object detection dataset (hard hats sample)
  • [x] Importing hard hats into another obj detection project No issues during upload, no issues on the site.

However, I also labelled a keypoints dataset, created a version and attempted to download it, which failed. I think it's related to json, but unrelated to my changes.

Could you test if the following works?

roboflow download https://app.roboflow.com/linas-ws/keypooint-test/1 -l dataset-kp
Error response
  ❯ roboflow download https://app.roboflow.com/linas-ws/keypooint-test/1 -l dataset-kp
loading Roboflow workspace...
loading Roboflow project...
Traceback (most recent call last):
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/requests/models.py", line 974, in json
    return complexjson.loads(self.text, **kwargs)
  File "/Users/linasko/.pyenv/versions/3.8.19/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/Users/linasko/.pyenv/versions/3.8.19/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Users/linasko/.pyenv/versions/3.8.19/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/core/version.py", line 258, in export
    raise RuntimeError(response.json())
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/requests/models.py", line 978, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/linasko/rf/pr/roboflow-python/.venv/bin/roboflow", line 8, in <module>
    sys.exit(main())
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/roboflowpy.py", line 423, in main
    args.func(args)
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/roboflowpy.py", line 45, in download
    version.download(args.format, location=args.location, overwrite=True)
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/core/version.py", line 205, in download
    self.export(model_format)
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/core/version.py", line 260, in export
    response.raise_for_status()
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://api.roboflow.com/linas-ws/keypooint-test/1/voc?api_key=<MY_API_KEY>

LinasKo avatar Aug 15 '24 16:08 LinasKo

@tonylampada, responding to your Loom, I see the bigger picture now. We're both working towards the same goal of not hiding useful error messages.

The problem is that dataset import captures the result of project.single_upload and logs it with _log_img_upload, whereas uploading a single image/annotation completely ignores the result.

The correct solution would keep one of these approaches. Should we implement an equivalent of _log_img_upload in roboflow/roboflowpy.py::upload_image(), and then remove the raise introduced by this PR?

LinasKo avatar Aug 16 '24 07:08 LinasKo

Aforementioned loom here just for reference (so we don't exclude @caiquejjx from the conversation :-)) https://www.loom.com/share/5f7a6520e0364b7588fd6130c682e0ef

tonylampada avatar Aug 16 '24 10:08 tonylampada

@LinasKo here's what I'm thinking.

project.single_upload has an "identity crisis" because it's not exactly "single" upload, is it? :-) It can upload:

  • one image
  • one annotation (I doubt this is much used, but it's there)
  • one image, and then one annotation

Then there's the question of what do we do when something goes wrong with one of those operations. Ideally we should decouple those in different methods, but that would be a breaking change, probably not worth it. So, scratch that.

Then, when should single_upload throw an error? One possible answer is:

  • one image = RAISE if fails
  • one annotation = RAISE if fails
  • one image, and then one annotation
    • RAISE if image fails
    • RETURN image id + annotation error if annotation fails

I think this kinda makes sense from the perspective of what error means. But I don't like it. Because it's a little more complex to implement inside (single_upload) and that complexity also leaks by forcing the caller to know when to handle and/or look inside the results.

So I'm tending to think that a better answer is:

  • Always return, never raise. The return value is a dictionary that contains the result for each operation (like: "image upload=ok, here's the id, annotation upload failed, here's the error")

Now, duplicating _log_img_upload is something I don't like. DRY, right? Maybe we should move that responsibility into single_upload as well, before returning the result dic. wdyt?

tonylampada avatar Aug 16 '24 11:08 tonylampada

@tonylampada thanks for the loom! one note is that single_upload kinda did that before already but the problem was that the upload method didn't return anything and one of the reasons I went for raising is because people probably don't check the return of the upload since it doesn't returns anything, so besides the change on single_upload we should also start returning the result in upload, right? or maybe logging

caiquejjx avatar Aug 16 '24 13:08 caiquejjx

we should also start returning the result in upload, right?

Yup. That makes sense

or maybe logging

I think it might be better to move the logging responsibility into single_upload.

tonylampada avatar Aug 16 '24 13:08 tonylampada

I'm writing some tests for my PR (that maintains the compatibility with CLI), and this PR breaks the upload parse in some cases, I'll describe it better below.

In the line where the split is done by a colon and you access position 2, not all returns from the rfapi.upload_image have the status code, and so they don't have this second colon.

image

CleanShot 2024-08-18 at 17 57 44@2x

@caiquejjx @LinasKo @tonylampada

This PR should fix that, and another things:

#312

lrosemberg avatar Aug 18 '24 21:08 lrosemberg