parameter_override_option: handle numeric strings
Suppose I want to have:
"textequiv-index": {
"type": "string",
"enum": ["first", "last", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9"],
"default": "first",
"description": "Only extract lines with the specified TextEquiv/@index entries; 'first' and 'last' denote the first and last TextEquiv elements, regardless of their @index, respectively."
}
Okay, let's run with ocrd-segment-extract-lines -P textequiv-index 0 then.
This is what happens:
Exception: Invalid parameters ["[textequiv-index] 0 is not of type 'string'", "[textequiv-index] 0 is not one of ['first', 'last', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9']"]
This is why: https://github.com/OCR-D/core/blob/e841ce8443ec7e0fa90e99796b3a947be09844d9/ocrd_utils/ocrd_utils/introspect.py#L10-L17
Apparently, the parser decodes the 0 as number, simply because it can. It does not bother to look into the tool json's type.
Is that really what we want?
(Workaround on the user side, of course, is: -P textequiv-index '"0"', but this is ugly and surprising. Surprising, because I can write -P textequiv-index first without the quotes.)
Is that really what we want?
It's a tradeoff, it simplifies string values (no need for quoting) but does require quoting for strings that might be parsed as numbers or booleans.
-P textequiv-index '"0"'
Is obviously not ideal, to say the least.
simply because it can. It does not bother to look into the tool json's
type.
If the type is string, then we don't need to (and should not) try to parse with JSON. That is probably the best solution here, right?
If the
typeis string, then we don't need to (and should not) try to parse with JSON. That is probably the best solution here, right?
Yes, I think so.