core parameter_override_option: handle numeric strings

Suppose I want to have:

        "textequiv-index": {
          "type": "string",
          "enum": ["first", "last", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9"],
          "default": "first",
          "description": "Only extract lines with the specified TextEquiv/@index entries; 'first' and 'last' denote the first and last TextEquiv elements, regardless of their @index, respectively."
        }

Okay, let's run with ocrd-segment-extract-lines -P textequiv-index 0 then.

This is what happens:

Exception: Invalid parameters ["[textequiv-index] 0 is not of type 'string'", "[textequiv-index] 0 is not one of ['first', 'last', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9']"]

This is why: https://github.com/OCR-D/core/blob/e841ce8443ec7e0fa90e99796b3a947be09844d9/ocrd_utils/ocrd_utils/introspect.py#L10-L17

Apparently, the parser decodes the 0 as number, simply because it can. It does not bother to look into the tool json's type.

Is that really what we want?

(Workaround on the user side, of course, is: -P textequiv-index '"0"', but this is ugly and surprising. Surprising, because I can write -P textequiv-index first without the quotes.)

Nov 10 '22 17:11 bertsky

Is that really what we want?

It's a tradeoff, it simplifies string values (no need for quoting) but does require quoting for strings that might be parsed as numbers or booleans.

-P textequiv-index '"0"'

Is obviously not ideal, to say the least.

simply because it can. It does not bother to look into the tool json's type.

If the type is string, then we don't need to (and should not) try to parse with JSON. That is probably the best solution here, right?

Nov 14 '22 13:11 kba

If the type is string, then we don't need to (and should not) try to parse with JSON. That is probably the best solution here, right?

Yes, I think so.

Nov 23 '22 18:11 bertsky