RTX icon indicating copy to clipboard operation
RTX copied to clipboard

SyntaxError: JSON.parse: unexpected character

Open saramsey opened this issue 3 years ago • 23 comments

In today's Translator Question of the Month session, a query was run through ARAX (the second query that is listed in the GitHub issue hyperlinked as "Translator Question of the Month"),

{
  "edges": {
    "e0": {
      "object": "n0",
      "predicates": [
        "biolink:related_to"
      ],
      "subject": "n1"
    }
  },
  "nodes": {
    "n0": {
      "categories": [
        "biolink:Disease"
      ],
      "ids": [
        "MONDO:0005083"
      ]
    },
    "n1": {
      "categories": [
        "biolink:Disease"
      ]
    }
  }
}

that caused an error in ARAX:

An error was encountered while contacting the server (SyntaxError: JSON.parse: unexpected character at line 1 column 8834772 of the JSON data)
Screen Shot 2022-08-05 at 9 21 02 AM

can someone help with investigating this?

saramsey avatar Aug 05 '22 16:08 saramsey

Hi Luis, would you mind taking a look at this issue to see if it is reproducible and (if so) maybe try to determine if the bad JSON also comes back if we use cURL to post the query against the API?

saramsey avatar Aug 05 '22 16:08 saramsey

Hi @saramsey . It does look like it is bad JSON getting sent from the back end to the UI.

The culprit appears to be an unquoted string: image

~~Note that there might be other places in this large response that have this issue; this is the first one reported by the console.~~ Edit: there is only one instance of unquoted Infinity in that response.

One quicker way to repro is to pull up response id = 55232.

isbluis avatar Aug 05 '22 18:08 isbluis

So is this a bug in the Python JSON libraries or perhaps flask serializer? I think we have control over the serializer at some level if we tinker with the flask libraries (although replicating it to all ITRB instances may be irritating.

We may be better served by having some ARAX code that inspects the attributes and does what we want with Inf. Do we assume that the original value is a Python float Inf and the serializer is not doing the right thing?

edeutsch avatar Aug 06 '22 03:08 edeutsch

i.e. this is not an unquoted string, but rather started out as a float, which I think in Python (and other implementations) can be Inf or NaN. So what is the proper thing to do with a float Inf or NaN? Perhaps depends an official JSON specification.

edeutsch avatar Aug 06 '22 03:08 edeutsch

JSON does not have any special keywords for Inf/Infinity/NaN etc: https://stackoverflow.com/questions/1423081/json-left-out-infinity-and-nan-json-status-in-ecmascript

So any non-numeric values have to be quoted, or else set to some very large number for Inf (e.g. 1.0e+1024).

If we can trace the provenance of that attribute, we could warn the responsible KP to update their code. Or perhaps ours...?

isbluis avatar Aug 06 '22 06:08 isbluis

Hello Steve and Eric. Is this still a concern, or can we close this issue?

isbluis avatar Oct 12 '23 07:10 isbluis

okay to close from my perspective, but Steve may still want to check.

edeutsch avatar Oct 12 '23 23:10 edeutsch

Hi guys, thanks for the nudge. I'm slammed with background tasker issues at the moment. Can I look into this next week?

saramsey avatar Oct 19 '23 20:10 saramsey

Sure thing, @saramsey . I just wanted to close old/completed issues, but there is no other pressure on this otherwise. Thanks!

isbluis avatar Oct 19 '23 21:10 isbluis

Hi @isbluis and @edeutsch thank you for giving me a chance to have a think about this.

Here's my reprex of the issue, I think?

Last login: Fri Oct 20 15:38:16 on ttys008
sramsey-laptop:~ sramsey$ python3
Python 3.11.5 (main, Aug 24 2023, 15:09:45) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> import math
>>> json.dumps(math.inf)
'Infinity'
>>>

So apparently math.inf gets serialized as the string Infinity. Evidently this is a well-known issue with python's json library, https://evanhahn.com/pythons-nonstandard-json-encoding/

There is a countermeasure! We simply need to pass allow_nan=False, and if there is a math.inf somewhere in the nested objects being serialized, we will get a ValueError which is better (IMO) than sending invalid JSON back to the client:

>>> json.dumps(math.inf, allow_nan=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
          ^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
ValueError: Out of range float values are not JSON compliant

I figure an exception leading to ERROR messages in the TRAPI message log is good because it will force someone to track down the specific issue and fix it. Having invalid JSON go back to the client seems like something we would want to avoid, even at the cost of occasionally having errors in the TRAPI message and no results.

If you approve, we could look into updating query_controller.py and asyncquery_controller.py to fix the issue (I think that is where the TRAPI response gets serialized, ultimately?).

saramsey avatar Oct 21 '23 04:10 saramsey

I think there is a custom Flask serializer that we could tweak. but it is pretty opaque to me..

Maybe this could be tweaked to handle inf:

https://github.com/RTXteam/RTX/blob/master/code/UI/OpenAPI/python-flask-server/openapi_server/encoder.py

edeutsch avatar Oct 23 '23 04:10 edeutsch

For non-streaming queries via the /query endpoint, it seems like we just use json.dumps, per this line of code: https://github.com/RTXteam/RTX/blob/fa575ade84dc4a792ba3eeca915acd38d765095f/code/UI/OpenAPI/python-flask-server/openapi_server/controllers/query_controller.py#L81

And for streaming queries via the /query endpoint, it seems like we also just use json.dumps: https://github.com/RTXteam/RTX/blob/fa575ade84dc4a792ba3eeca915acd38d765095f/code/ARAX/ARAXQuery/ARAX_query.py#L164

In both cases, couldn't we just pass allow_nan=False ?

saramsey avatar Oct 23 '23 19:10 saramsey

I am surprised that we use a json.dumps() for non-streaming queries. I am wondering if there is more than meets the eye.

For streaming queries, this yield() does seem like the right piece of code yes.

You may be right,.

edeutsch avatar Oct 23 '23 20:10 edeutsch

@edeutsch should we try adding allow_nan=False the the above line of code?

saramsey avatar Nov 30 '23 17:11 saramsey

sounds plausible, but I don't really understand the details/implications here.

edeutsch avatar Nov 30 '23 17:11 edeutsch

I've pushed a commit for this issue (hope it works) to the dev branch (commit 9d38702).

saramsey avatar Jan 03 '24 17:01 saramsey

Just added a commit for the KG2 query_controller.py as well (844d9a4), also in the dev branch.

saramsey avatar Jan 03 '24 17:01 saramsey

With the latest patch, ARAX more gracefully handles the error:

Screenshot 2024-01-03 at 1 40 37 PM

The cause of the error is a NGD value of math.inf, in the knowledge_graph:

            'knowledge_graph': {'edges': {'N1_0': {'attributes': [{'attribute_source': None,
                                                                    'attribute_type_id': 'EDAM-DATA:2526',
                                                                    'attributes': None,
                                                                    'description': I REMOVED SOME STUFF -- SAR,
                                                                    'original_attribute_name': 'normalized_google_distance',
                                                                    'value': 'inf',
                                                                    'value_type_id': None,
                                                                    'value_url': 'https://arax.ncats.io/api/rtx/v1/ui/#/PubmedMeshNgd'},
                                                                   {'attribute_source': None,
                                                                    'attribute_type_id': 'EDAM-OPERATION:0226',
                                                                    'attributes': None,
                                                                    'description': None,
                                                                    'original_attribute_name': 'virtual_relation_label',
                                                                    'value': 'N1',
                                                                    'value_type_id': None,
                                                                    'value_url': None},
                                                                   {'attribute_source': None,

As for why a value of math.inf is getting into an edge in the KnowledgeGraph, maybe @dkoslicki's team can help with that, since it appears to be an NGD issue.

saramsey avatar Jan 03 '24 21:01 saramsey

Ah, @saramsey the NGD value can be undefined. I see in the code that sometimes it will return a math.nan which will also likely give an "JSON out of range float values" error. So two things:

  1. @saramsey or @edeutsch do you know a JSON friendly way to communicate undefined/NaN?
  2. @kvnthomas98 can you take a look at why it's returning a math.inf when the NGD code it just has math.nan's? Eg. is it possible this line doesn't raise a value error but does return an inf value?

dkoslicki avatar Jan 03 '24 22:01 dkoslicki

By the way, while we're at it, we should update https://arax.ncats.io/api/rtx/v1/ui/#/PubmedMeshNgd to https://arax.ncats.io/api/arax/v1.4/ui/#/PubmedMeshNgd/pubmed_mesh_ngd

dkoslicki avatar Jan 03 '24 22:01 dkoslicki

I do not know a JSON friendly way to communicate undefined/NaN if a field has to be numerical. If it can also be a string, then you could use "NaN" or "inf" or something. Or possible null.

If may be safer all around to use an obviously bogus and high number like 99999 instead.

edeutsch avatar Jan 03 '24 22:01 edeutsch

We get inf values because here if we return nan, we assign the ngd value to default_value and default_value is set to inf if not provided in our query here.

>>> import numpy as np
>>> import math
>>> np.isfinite(math.nan)
False

@dkoslicki , So when we return NaN, should we simply return a bogus small number(eg -99999)?

kvnthomas98 avatar Jan 04 '24 12:01 kvnthomas98

Ah, thanks @kvnthomas98 , let's go with a bogus number, but make it large, like Eric's suggestion. If we make it negative, then value comparisons when we do ranking might get messed up

dkoslicki avatar Jan 04 '24 12:01 dkoslicki

Hi all. Is this good to close?

isbluis avatar Jun 12 '24 20:06 isbluis

I think yes, thanks!

edeutsch avatar Jun 13 '24 02:06 edeutsch