superset icon indicating copy to clipboard operation
superset copied to clipboard

fix: use pessimistic json encoder in SQL Lab

Open mistercrunch opened this issue 1 year ago • 3 comments

SUMMARY

An issue reported in https://github.com/apache/superset/issues/28001 shows SQL Lab failing hard when a database driver returns a type that can't be decoded for serialization. Switching to this pessimistic helper function for json serialization will prevent a hard failure, but will obfuscate the data for the specific cells where things can't be serialized.

Not being in a situation where I can reproduce the issue, I'm operating with the stacktrace, meaning I can't really handle the specific cell, but can prevent SQL Lab from failing hard on a single cell issue.

Few related notes:

  • maybe there's something around the driver being in utf8 mode of some kind. In theory we could say that driver / superset should agree on utf8 for things to work
  • driver should probably giving us a bytes type from a VARBINARY and from the stacktrace it appears to be something else, presumably str, though that's unclear
  • could be good to refactor out everything json-related out of superset/utils/core.py and into a new superset/utils/json.py, and make sure we always use thesame utility functions throughout the codebase as opposed to raw import json across the codebase. From my understanding there's some magic around simplejson injecting itself everywhere that happens too.

mistercrunch avatar Apr 29 '24 16:04 mistercrunch

@mistercrunch Should we show some kind of warning to the users when we obfuscate data? I'm trying to avoid a scenario where data is missing because of quality issues and users are not aware of the problem.

michael-s-molina avatar Apr 29 '24 17:04 michael-s-molina

+1 totally agree on the unit test + logging something to the user and probably logging something to the administrator if/when it's truly an unexpected error. Marking this as DRAFT as it needs more work, but want to share potential solution with the person who reported the issue.

mistercrunch avatar Apr 29 '24 18:04 mistercrunch

Digging a bit it appears the pessimistic serializer would fill the unserializable cell with f"Unserializable [{type(obj)}]" https://github.com/apache/superset/blob/master/superset/utils/core.py#L508-L510 , which seems reasonable.

Though maybe catching TypeError is not broad enough here too as I think given the diversity and maturity spectrum of drivers we deal with, we should assume that it's likely we might get objects we cannot serialize and should fail gracefully (let bot user know and leave a good trace in the logs)

So left TODO here:

  • try and get more info from the issue reported as to what type isn't handed,
    • handle that specific type as gracefully as possible
    • write a unit test specific to that type
  • assume we may get ANY object from a driver
    • write a unit test representing a fully "unhandlable" type
    • make sure we log something useful
    • fill in the cell something that makes some sense to the user

mistercrunch avatar Apr 29 '24 18:04 mistercrunch

I have the same problem encoder here #29457 but with TIMESTAMP datatype.

Version 4.0.1

Habeeb556 avatar Jul 02 '24 20:07 Habeeb556

@Habeeb556 looks like 4.0.2 brings this cheery in if that helps.

rusackas avatar Jul 03 '24 16:07 rusackas