SyntaxError: JSON.parse: unexpected character
In today's Translator Question of the Month session, a query was run through ARAX (the second query that is listed in the GitHub issue hyperlinked as "Translator Question of the Month"),
{
"edges": {
"e0": {
"object": "n0",
"predicates": [
"biolink:related_to"
],
"subject": "n1"
}
},
"nodes": {
"n0": {
"categories": [
"biolink:Disease"
],
"ids": [
"MONDO:0005083"
]
},
"n1": {
"categories": [
"biolink:Disease"
]
}
}
}
that caused an error in ARAX:
An error was encountered while contacting the server (SyntaxError: JSON.parse: unexpected character at line 1 column 8834772 of the JSON data)
can someone help with investigating this?
Hi Luis, would you mind taking a look at this issue to see if it is reproducible and (if so) maybe try to determine if the bad JSON also comes back if we use cURL to post the query against the API?
Hi @saramsey . It does look like it is bad JSON getting sent from the back end to the UI.
The culprit appears to be an unquoted string:

~~Note that there might be other places in this large response that have this issue; this is the first one reported by the console.~~
Edit: there is only one instance of unquoted Infinity in that response.
One quicker way to repro is to pull up response id = 55232.
So is this a bug in the Python JSON libraries or perhaps flask serializer? I think we have control over the serializer at some level if we tinker with the flask libraries (although replicating it to all ITRB instances may be irritating.
We may be better served by having some ARAX code that inspects the attributes and does what we want with Inf. Do we assume that the original value is a Python float Inf and the serializer is not doing the right thing?
i.e. this is not an unquoted string, but rather started out as a float, which I think in Python (and other implementations) can be Inf or NaN. So what is the proper thing to do with a float Inf or NaN? Perhaps depends an official JSON specification.
JSON does not have any special keywords for Inf/Infinity/NaN etc: https://stackoverflow.com/questions/1423081/json-left-out-infinity-and-nan-json-status-in-ecmascript
So any non-numeric values have to be quoted, or else set to some very large number for Inf (e.g. 1.0e+1024).
If we can trace the provenance of that attribute, we could warn the responsible KP to update their code. Or perhaps ours...?
Hello Steve and Eric. Is this still a concern, or can we close this issue?
okay to close from my perspective, but Steve may still want to check.
Hi guys, thanks for the nudge. I'm slammed with background tasker issues at the moment. Can I look into this next week?
Sure thing, @saramsey . I just wanted to close old/completed issues, but there is no other pressure on this otherwise. Thanks!
Hi @isbluis and @edeutsch thank you for giving me a chance to have a think about this.
Here's my reprex of the issue, I think?
Last login: Fri Oct 20 15:38:16 on ttys008
sramsey-laptop:~ sramsey$ python3
Python 3.11.5 (main, Aug 24 2023, 15:09:45) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> import math
>>> json.dumps(math.inf)
'Infinity'
>>>
So apparently math.inf gets serialized as the string Infinity. Evidently this is a well-known issue with python's json library, https://evanhahn.com/pythons-nonstandard-json-encoding/
There is a countermeasure! We simply need to pass allow_nan=False, and if there is a math.inf somewhere in the nested objects being serialized, we will get a ValueError which is better (IMO) than sending invalid JSON back to the client:
>>> json.dumps(math.inf, allow_nan=False)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/__init__.py", line 238, in dumps
**kw).encode(obj)
^^^^^^^^^^^
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/encoder.py", line 200, in encode
chunks = self.iterencode(o, _one_shot=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/encoder.py", line 258, in iterencode
return _iterencode(o, 0)
^^^^^^^^^^^^^^^^^
ValueError: Out of range float values are not JSON compliant
I figure an exception leading to ERROR messages in the TRAPI message log is good because it will force someone to track down the specific issue and fix it. Having invalid JSON go back to the client seems like something we would want to avoid, even at the cost of occasionally having errors in the TRAPI message and no results.
If you approve, we could look into updating query_controller.py and asyncquery_controller.py to fix the issue (I think that is where the TRAPI response gets serialized, ultimately?).
I think there is a custom Flask serializer that we could tweak. but it is pretty opaque to me..
Maybe this could be tweaked to handle inf:
https://github.com/RTXteam/RTX/blob/master/code/UI/OpenAPI/python-flask-server/openapi_server/encoder.py
For non-streaming queries via the /query endpoint, it seems like we just use json.dumps, per this line of code:
https://github.com/RTXteam/RTX/blob/fa575ade84dc4a792ba3eeca915acd38d765095f/code/UI/OpenAPI/python-flask-server/openapi_server/controllers/query_controller.py#L81
And for streaming queries via the /query endpoint, it seems like we also just use json.dumps:
https://github.com/RTXteam/RTX/blob/fa575ade84dc4a792ba3eeca915acd38d765095f/code/ARAX/ARAXQuery/ARAX_query.py#L164
In both cases, couldn't we just pass allow_nan=False ?
I am surprised that we use a json.dumps() for non-streaming queries. I am wondering if there is more than meets the eye.
For streaming queries, this yield() does seem like the right piece of code yes.
You may be right,.
@edeutsch should we try adding allow_nan=False the the above line of code?
sounds plausible, but I don't really understand the details/implications here.
I've pushed a commit for this issue (hope it works) to the dev branch (commit 9d38702).
Just added a commit for the KG2 query_controller.py as well (844d9a4), also in the dev branch.
With the latest patch, ARAX more gracefully handles the error:
The cause of the error is a NGD value of math.inf, in the knowledge_graph:
'knowledge_graph': {'edges': {'N1_0': {'attributes': [{'attribute_source': None,
'attribute_type_id': 'EDAM-DATA:2526',
'attributes': None,
'description': I REMOVED SOME STUFF -- SAR,
'original_attribute_name': 'normalized_google_distance',
'value': 'inf',
'value_type_id': None,
'value_url': 'https://arax.ncats.io/api/rtx/v1/ui/#/PubmedMeshNgd'},
{'attribute_source': None,
'attribute_type_id': 'EDAM-OPERATION:0226',
'attributes': None,
'description': None,
'original_attribute_name': 'virtual_relation_label',
'value': 'N1',
'value_type_id': None,
'value_url': None},
{'attribute_source': None,
As for why a value of math.inf is getting into an edge in the KnowledgeGraph, maybe @dkoslicki's team can help with that, since it appears to be an NGD issue.
Ah, @saramsey the NGD value can be undefined. I see in the code that sometimes it will return a math.nan which will also likely give an "JSON out of range float values" error. So two things:
- @saramsey or @edeutsch do you know a JSON friendly way to communicate undefined/NaN?
- @kvnthomas98 can you take a look at why it's returning a
math.infwhen the NGD code it just hasmath.nan's? Eg. is it possible this line doesn't raise a value error but does return aninfvalue?
By the way, while we're at it, we should update https://arax.ncats.io/api/rtx/v1/ui/#/PubmedMeshNgd to https://arax.ncats.io/api/arax/v1.4/ui/#/PubmedMeshNgd/pubmed_mesh_ngd
I do not know a JSON friendly way to communicate undefined/NaN if a field has to be numerical. If it can also be a string, then you could use "NaN" or "inf" or something. Or possible null.
If may be safer all around to use an obviously bogus and high number like 99999 instead.
We get inf values because here if we return nan, we assign the ngd value to default_value and default_value is set to inf if not provided in our query here.
>>> import numpy as np
>>> import math
>>> np.isfinite(math.nan)
False
@dkoslicki , So when we return NaN, should we simply return a bogus small number(eg -99999)?
Ah, thanks @kvnthomas98 , let's go with a bogus number, but make it large, like Eric's suggestion. If we make it negative, then value comparisons when we do ranking might get messed up
Hi all. Is this good to close?
I think yes, thanks!