grass icon indicating copy to clipboard operation
grass copied to clipboard

v.univar: add JSON support

Open kritibirda opened this issue 1 year ago • 14 comments

Use parson to add json output format support to the v.univar module.

Expected JSON schema: Root is a JSON object. The percentile option allows the user to request specific percentile which is written as a percentile_%d key to JSON object.

{
    "n": <int>,
    "missing": <int>,
    "nnull": <int>,
    "min": <double>,
    "max": <double>,
    "range": <double>,
    "sum": <double>,
    "mean": <double>,
    "mean_abs": <double>,
    "population_stddev": <double>,
    "population_variance": <double>,
    "population_coeff_variation": <double>,
    "sample_stddev": <double>,
    "sample_variance": <double>,
    "kurtosis": <double>,
    "skewness": <double>,
    "first_quartile": <double>,
    "median": <double>,
    "third_quartile": <double>,
    "percentile_90": <double>
}

kritibirda avatar Jun 07 '24 14:06 kritibirda

Special naming of keys would be hard to use from another tool. It might be better to have a list, but since we need the value and the key, it would need a mapping, so something like that? :

{
  ...
  "percentile": { 
    "5":  0.536736,
    "90": 0.7237272,
    "98": 0.863662721,
    "99.5": 0.916363
   },
  ...
}

echoix avatar Jun 07 '24 15:06 echoix

A good exercise to know if the json output is really usable from a tool (to be able to be used in an automated way), is to try to create a real jsonschema. If it's too hard to describe the format with a json schema, then it probably isn't usable, and can't be validated against that schema.

echoix avatar Jun 07 '24 15:06 echoix

I tried with an online tool, https://jsonschema.net/app/schemas/456991 to see what it looks like (by generating the skeleton of a jsonschema). I had to change your example to be a valid json first, I took:


{
    "n": 55,
    "missing": 55,
    "nnull": 55,
    "min": 1.234,
    "max": 1.2345,
    "range": 1.23456,
    "sum": 1.234569,
    "mean": 1.234567,
    "mean_abs": 1.234568,
    "population_stddev": 2.3456,
    "population_variance": 2.34567,
    "population_coeff_variation": 2.345678,
    "sample_stddev": 4.5556,
    "sample_variance": 6.7777,
    "kurtosis": 3.4455,
    "skewness": 6.8888,
    "first_quartile": 8.8888,
    "median": 7.777,
    "third_quartile": 8.8882,
    "percentile_90": 6.7776
}

Gives the jsonschema:

{
    "$schema": "https://json-schema.org/draft/2019-09/schema",
    "$id": "http://example.com/example.json",
    "type": "object",
    "default": {},
    "title": "Root Schema",
    "required": [
        "n",
        "missing",
        "nnull",
        "min",
        "max",
        "range",
        "sum",
        "mean",
        "mean_abs",
        "population_stddev",
        "population_variance",
        "population_coeff_variation",
        "sample_stddev",
        "sample_variance",
        "kurtosis",
        "skewness",
        "first_quartile",
        "median",
        "third_quartile",
        "percentile_90"
    ],
    "properties": {
        "n": {
            "type": "integer",
            "default": 0,
            "title": "The n Schema",
            "examples": [
                55
            ]
        },
        "missing": {
            "type": "integer",
            "default": 0,
            "title": "The missing Schema",
            "examples": [
                55
            ]
        },
        "nnull": {
            "type": "integer",
            "default": 0,
            "title": "The nnull Schema",
            "examples": [
                55
            ]
        },
        "min": {
            "type": "number",
            "default": 0.0,
            "title": "The min Schema",
            "examples": [
                1.234
            ]
        },
        "max": {
            "type": "number",
            "default": 0.0,
            "title": "The max Schema",
            "examples": [
                1.2345
            ]
        },
        "range": {
            "type": "number",
            "default": 0.0,
            "title": "The range Schema",
            "examples": [
                1.23456
            ]
        },
        "sum": {
            "type": "number",
            "default": 0.0,
            "title": "The sum Schema",
            "examples": [
                1.234569
            ]
        },
        "mean": {
            "type": "number",
            "default": 0.0,
            "title": "The mean Schema",
            "examples": [
                1.234567
            ]
        },
        "mean_abs": {
            "type": "number",
            "default": 0.0,
            "title": "The mean_abs Schema",
            "examples": [
                1.234568
            ]
        },
        "population_stddev": {
            "type": "number",
            "default": 0.0,
            "title": "The population_stddev Schema",
            "examples": [
                2.3456
            ]
        },
        "population_variance": {
            "type": "number",
            "default": 0.0,
            "title": "The population_variance Schema",
            "examples": [
                2.34567
            ]
        },
        "population_coeff_variation": {
            "type": "number",
            "default": 0.0,
            "title": "The population_coeff_variation Schema",
            "examples": [
                2.345678
            ]
        },
        "sample_stddev": {
            "type": "number",
            "default": 0.0,
            "title": "The sample_stddev Schema",
            "examples": [
                4.5556
            ]
        },
        "sample_variance": {
            "type": "number",
            "default": 0.0,
            "title": "The sample_variance Schema",
            "examples": [
                6.7777
            ]
        },
        "kurtosis": {
            "type": "number",
            "default": 0.0,
            "title": "The kurtosis Schema",
            "examples": [
                3.4455
            ]
        },
        "skewness": {
            "type": "number",
            "default": 0.0,
            "title": "The skewness Schema",
            "examples": [
                6.8888
            ]
        },
        "first_quartile": {
            "type": "number",
            "default": 0.0,
            "title": "The first_quartile Schema",
            "examples": [
                8.8888
            ]
        },
        "median": {
            "type": "number",
            "default": 0.0,
            "title": "The median Schema",
            "examples": [
                7.777
            ]
        },
        "third_quartile": {
            "type": "number",
            "default": 0.0,
            "title": "The third_quartile Schema",
            "examples": [
                8.8882
            ]
        },
        "percentile_90": {
            "type": "number",
            "default": 0.0,
            "title": "The percentile_90 Schema",
            "examples": [
                6.7776
            ]
        }
    },
    "examples": [{
        "n": 55,
        "missing": 55,
        "nnull": 55,
        "min": 1.234,
        "max": 1.2345,
        "range": 1.23456,
        "sum": 1.234569,
        "mean": 1.234567,
        "mean_abs": 1.234568,
        "population_stddev": 2.3456,
        "population_variance": 2.34567,
        "population_coeff_variation": 2.345678,
        "sample_stddev": 4.5556,
        "sample_variance": 6.7777,
        "kurtosis": 3.4455,
        "skewness": 6.8888,
        "first_quartile": 8.8888,
        "median": 7.777,
        "third_quartile": 8.8882,
        "percentile_90": 6.7776
    }]
}

From this, we see that having hardcoded percentile keys probably isn't what we want. What about percentile 99.5 (a common percentile in environmental legislation)

echoix avatar Jun 07 '24 15:06 echoix

Special naming of keys would be hard to use from another tool. It might be better to have a list, but since we need the value and the key, it would need a mapping...

I was actually thinking I used two lists in db.univar, but it turns out I used whatever was in the shell scriptstyle format, so list of keys which depends on the input. It depends on the input, so you can figure the keys out. I think that was my logic there. It is terrible for writing a schema and that's a valid criticism. I think we can change than in a way which would be backwards compatible and make it consistent across all percentile outputs at the same time if we decide to go different direction.

wenzeslaus avatar Jun 07 '24 20:06 wenzeslaus

I was actually thinking I used two lists in db.univar, but it turns out I used whatever was in the shell scriptstyle format, so list of keys which depends on the input. It depends on the input, so you can figure the keys out. I think that was my logic there. It is terrible for writing a schema and that's a valid criticism. I think we can change than in a way which would be backwards compatible and make it consistent across all percentile outputs at the same time if we decide to go different direction.

Was the output of db.univar with the two lists released yet? If not, it's not a breaking change yet.

echoix avatar Jun 07 '24 20:06 echoix

Was the output of db.univar with the two lists released yet?

Yes, it was, in 8.3.0. https://github.com/OSGeo/grass/commit/120f198fca22a564334d7721fabbe8c6e299bf6b

I think the situation now is: 1) We should do whatever is right here not to spread a bad design from one tool to another. 2) We might be able to add the better percentile output to v.db.univar depending on the keys we choose here.

So, let's decide the percentiles here.

I guess the following is lengthy, but easy to express in a schema:

{
"percentiles": [
    {"percentile": 95, "value": 200.03},
    {"percentile": 99.9, "value": 220.01}
]
}

wenzeslaus avatar Jun 11 '24 00:06 wenzeslaus

Was the output of db.univar with the two lists released yet?

Yes, it was, in 8.3.0. 120f198

I think the situation now is: 1) We should do whatever is right here not to spread a bad design from one tool to another. 2) We might be able to add the better percentile output to v.db.univar depending on the keys we choose here.

So, let's decide the percentiles here.

I guess the following is lengthy, but easy to express in a schema:

{
"percentiles": [
    {"percentile": 95, "value": 200.03},
    {"percentile": 99.9, "value": 220.01}
]
}

It’s lengthy, yes, but I like the fact that a list is used here, as you can repeat the same percentile if needed, and it allows to keep the same order of percentiles requested in input. I didn’t have an idea how to keep the ordering in my first example.

echoix avatar Jun 11 '24 00:06 echoix

I checked v.db.univar/db.univar again and I used percentiles as two lists, not a mapping (I must have looked at a wrong piece of code before) nested together with everything else under statistics. The they give:

{
  "statistics": {
    "percentiles": [95, 99.9],
    "percentile_values": [200.03, 220.01]
  }
}

I still prefer to do it right rather than the same as in v.db.univar. Is the right solution one list of mappings rather than two lists? It seems that it is easier to just say there is a list rather than saying there are two lists of the same length.

wenzeslaus avatar Jun 11 '24 01:06 wenzeslaus

What do you this @kritibirda26, does the list of dictionaries look good to you?

wenzeslaus avatar Jun 11 '24 02:06 wenzeslaus

Hi @wenzeslaus and @echoix, Sorry for the delay in the response. The dictionaries for percentiles make sense to me. I'll update the format.

kritibirda avatar Jun 11 '24 20:06 kritibirda

Hi @wenzeslaus and @echoix, Sorry for the delay in the response. The dictionaries for percentiles make sense to me. I'll update the format.

Are you referring to the schema suggested by @echoix?

{
"percentiles": [
    {"percentile": 95, "value": 200.03},
    {"percentile": 99.9, "value": 220.01}
]
}

This is the approach I think we should take.

cwhite911 avatar Jun 15 '24 09:06 cwhite911

Yes, on it.

kritibirda avatar Jun 17 '24 14:06 kritibirda

@cwhite911 Should I make similar changes to PR updating r.univar as well?

kritibirda avatar Jun 20 '24 18:06 kritibirda

@wenzeslaus Hi! Can you also review this PR so that it can be merged?

kritibirda avatar Jun 28 '24 15:06 kritibirda