-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: raise UploadErrors for images and annotations #266
Conversation
Hey @caiquejjx 👋 Typically, we ask for a Colab to showcase the changes, which significantly speeds up the review process. I made one for us this time: I like the change, but wonder what the broader impact would be. Whether there's anyone relying on us failing silently. |
Thanks @LinasKo! yeah it makes sense |
Hi @caiquejjx, I'll be looking at this, and if you're around - let's get it merged! I've tested the solution and it works well, but I'd like a slightly safer approach to parsing the error. Here's what I have in mind: def _parse_upload_error(self, error: rfapi.UploadError) -> str:
dict_part = str(error).split(": ", 2)[2]
# Could be done with ast.literal_eval, but this is safer.
if re.search(r"'\w+':", dict_part):
temp_str = dict_part.replace(r"\'", "<PLACEHOLDER>")
temp_str = temp_str.replace("'", '"')
dict_part = temp_str.replace("<PLACEHOLDER>", "'")
parsed_dict: dict = json.loads(dict_part)
message = parsed_dict.get("message")
return message or str(parsed_dict) Could you replace the method (and import Let's also add a correct annotation test, with annotation format such as this: Annotations Dictannotation = {
"info": {
"year": "2020",
"version": "1",
"description": "None",
"contributor": "Linas",
"url": "https://app.roboflow.ai/datasets/hard-hat-sample/1",
"date_created": "2000-01-01T00:00:00+00:00",
},
"licenses": [{"id": 1, "url": "https://creativecommons.org/publicdomain/zero/1.0/", "name": "Public Domain"}],
"categories": [
{"id": 0, "name": "cat", "supercategory": "animals"},
],
"images": [
{
"id": 0,
"license": 1,
"file_name": img_path,
"height": 1024,
"width": 1792,
"date_captured": "2020-07-20T19:39:26+00:00",
}
],
"annotations": [
{
"id": 0,
"image_id": 0,
"category_id": 0,
"bbox": [45, 2, 85, 85],
"area": 7225,
"segmentation": [],
"iscrowd": 0,
},
{
"id": 1,
"image_id": 0,
"category_id": 0,
"bbox": [324, 29, 72, 81],
"area": 5832,
"segmentation": [],
"iscrowd": 0,
},
],
} |
Hey @LinasKo sure thing, I'll work on these soon and I guess we should also have automated tests right? (not sure if it was what you meant) |
Aye, I wasn't too clear. A Colab will be fine this time - no need for unit tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work spotting the missing replacings!
It's so tempting to revert back to ast.literal_eval
at this point - seems so much simpler. Following the threads though, I see ways of crashing ast.literal_eval
with just 301 symbols. If I knew whether the input is considered trusted, it would be an easy choice, but now I'd rather not take the chances.
I left one comment in the code. I think we're almost good to go.
roboflow/core/project.py
Outdated
if re.search(r"'\w+':", dict_part): | ||
dict_part = dict_part.replace("True", "true") | ||
dict_part = dict_part.replace("False", "false") | ||
dict_part = dict_part.replace("None", "null") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if re.search(r"'\w+':", dict_part)
is false?
We still wish to load from json, but since we didn't run dict_part = dict_part.replace("True", "true")
, it might not be valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess now I got it right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much - looks good! 👍
Extra tests here show exactly what I'd expect: Colab
- Upload successful for new image + annotation
- Error saying annotations already exist, if you try again
- Incorrect annotation format error if wrong annotations are passed in.
@tonylampada, could you please have a look at this? I believe it is ready to merge. |
@LinasKo I think uploading with the CLI relies on the previous more lenient handling of annotation errors. Can you test that and see if it still has the same behaviour when importing datasets with the CLI? Also, I think the backend already provides valid json in case of error, it just seems weird parsing it using regex like that. |
@tonylampada, here's the
I very much agree it's weird! The correct solution would be to change wherever the json is produced. Alternatively, let's use the whole string. Also, we can avoid all of this with CLI test results coming soon. |
@tonylampada, I've tested:
However, I also labelled a keypoints dataset, created a version and attempted to download it, which failed. Could you test if the following works? roboflow download https://app.roboflow.com/linas-ws/keypooint-test/1 -l dataset-kp Error response
|
@tonylampada, responding to your Loom, I see the bigger picture now. We're both working towards the same goal of not hiding useful error messages. The problem is that dataset import captures the result of The correct solution would keep one of these approaches. Should we implement an equivalent of |
Aforementioned loom here just for reference (so we don't exclude @caiquejjx from the conversation :-)) |
@LinasKo here's what I'm thinking.
Then there's the question of what do we do when something goes wrong with one of those operations. Then, when should single_upload throw an error?
I think this kinda makes sense from the perspective of what error means. So I'm tending to think that a better answer is:
Now, duplicating |
@tonylampada thanks for the loom! one note is that |
Yup. That makes sense
I think it might be better to move the logging responsibility into single_upload. |
I'm writing some tests for my PR (that maintains the compatibility with CLI), and this PR breaks the upload parse in some cases, I'll describe it better below. In the line where the split is done by a colon and you access position 2, not all returns from the @caiquejjx @LinasKo @tonylampada This PR should fix that, and another things: |
Description
Fixes issue #235 by raising
UploadError
exceptions that happens insidesingle_upload
Type of change
Please delete options that are not relevant.
How has this change been tested, please provide a testcase or example of how you tested the change?
Same test as the issue
Any specific deployment considerations
For example, documentation changes, usability, usage/costs, secrets, etc.
Docs