core icon indicating copy to clipboard operation
core copied to clipboard

parameter_override_option: handle numeric strings

Open bertsky opened this issue 3 years ago • 2 comments

Suppose I want to have:

        "textequiv-index": {
          "type": "string",
          "enum": ["first", "last", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9"],
          "default": "first",
          "description": "Only extract lines with the specified TextEquiv/@index entries; 'first' and 'last' denote the first and last TextEquiv elements, regardless of their @index, respectively."
        }

Okay, let's run with ocrd-segment-extract-lines -P textequiv-index 0 then.

This is what happens:

Exception: Invalid parameters ["[textequiv-index] 0 is not of type 'string'", "[textequiv-index] 0 is not one of ['first', 'last', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9']"]

This is why: https://github.com/OCR-D/core/blob/e841ce8443ec7e0fa90e99796b3a947be09844d9/ocrd_utils/ocrd_utils/introspect.py#L10-L17

Apparently, the parser decodes the 0 as number, simply because it can. It does not bother to look into the tool json's type.

Is that really what we want?

(Workaround on the user side, of course, is: -P textequiv-index '"0"', but this is ugly and surprising. Surprising, because I can write -P textequiv-index first without the quotes.)

bertsky avatar Nov 10 '22 17:11 bertsky

Is that really what we want?

It's a tradeoff, it simplifies string values (no need for quoting) but does require quoting for strings that might be parsed as numbers or booleans.

-P textequiv-index '"0"'

Is obviously not ideal, to say the least.

simply because it can. It does not bother to look into the tool json's type.

If the type is string, then we don't need to (and should not) try to parse with JSON. That is probably the best solution here, right?

kba avatar Nov 14 '22 13:11 kba

If the type is string, then we don't need to (and should not) try to parse with JSON. That is probably the best solution here, right?

Yes, I think so.

bertsky avatar Nov 23 '22 18:11 bertsky