fix: use pessimistic json encoder in SQL Lab
SUMMARY
An issue reported in https://github.com/apache/superset/issues/28001 shows SQL Lab failing hard when a database driver returns a type that can't be decoded for serialization. Switching to this pessimistic helper function for json serialization will prevent a hard failure, but will obfuscate the data for the specific cells where things can't be serialized.
Not being in a situation where I can reproduce the issue, I'm operating with the stacktrace, meaning I can't really handle the specific cell, but can prevent SQL Lab from failing hard on a single cell issue.
Few related notes:
- maybe there's something around the driver being in utf8 mode of some kind. In theory we could say that driver / superset should agree on utf8 for things to work
- driver should probably giving us a
bytestype from aVARBINARYand from the stacktrace it appears to be something else, presumablystr, though that's unclear - could be good to refactor out everything json-related out of superset/utils/core.py and into a new superset/utils/json.py, and make sure we always use thesame utility functions throughout the codebase as opposed to raw
import jsonacross the codebase. From my understanding there's some magic around simplejson injecting itself everywhere that happens too.
@mistercrunch Should we show some kind of warning to the users when we obfuscate data? I'm trying to avoid a scenario where data is missing because of quality issues and users are not aware of the problem.
+1 totally agree on the unit test + logging something to the user and probably logging something to the administrator if/when it's truly an unexpected error. Marking this as DRAFT as it needs more work, but want to share potential solution with the person who reported the issue.
Digging a bit it appears the pessimistic serializer would fill the unserializable cell with f"Unserializable [{type(obj)}]" https://github.com/apache/superset/blob/master/superset/utils/core.py#L508-L510 , which seems reasonable.
Though maybe catching TypeError is not broad enough here too as I think given the diversity and maturity spectrum of drivers we deal with, we should assume that it's likely we might get objects we cannot serialize and should fail gracefully (let bot user know and leave a good trace in the logs)
So left TODO here:
- try and get more info from the issue reported as to what type isn't handed,
- handle that specific type as gracefully as possible
- write a unit test specific to that type
- assume we may get ANY object from a driver
- write a unit test representing a fully "unhandlable" type
- make sure we log something useful
- fill in the cell something that makes some sense to the user
I have the same problem encoder here #29457 but with TIMESTAMP datatype.
Version 4.0.1
@Habeeb556 looks like 4.0.2 brings this cheery in if that helps.