blueprint icon indicating copy to clipboard operation
blueprint copied to clipboard

Fix crash when marshalling lists, and add support for 'object' and 'any' types

Open ericvergnaud opened this issue 1 year ago • 5 comments

Our current implementation crashes when marshalling/unmarshalling generic lists, such as list[some_class].

This PR fixes that crash.

Our current implementation also does not support marshalling/unmarshalling generics with 'object' or 'Any' as type parameters, such as:

@dataclass
class SomeClass:
  field_a: list[object]
  field_b: dict[str, Any]

This PR fixes that by allowing developers to marshall/unmarshall any type, thus making the storage layer more robust.

Required for https://github.com/databrickslabs/remorph/pull/1488

ericvergnaud avatar Mar 13 '25 15:03 ericvergnaud

✅ 40/40 passed, 2 skipped, 1m58s total

Running from acceptance #299

github-actions[bot] avatar Mar 13 '25 15:03 github-actions[bot]

@reviewer please note that the no-cheat check needs to be acceped:

  • cheats were added to existing code in paths.py, so that formatting doesn't yell
  • the complexity of _marshal is inherent to the problem being solved

ericvergnaud avatar Mar 13 '25 15:03 ericvergnaud

@asnare @sundarshankar89 any chance you can review this PR ?

ericvergnaud avatar Mar 20 '25 08:03 ericvergnaud

With respect to the purpose of this PR, I'm not sure we should be should be supporting object or Any: these go against the spirit and purpose of type annotations, and to an extent I'm missing the reason we want to support these. In a few places the existing code is a bit lazy and uses Any, but in general it shouldn't be necessary for users of this class to use it. In particular: a) we don't allow any value; b) it should be possible to express (via a union and/or type alias) arbitrary combinations (including nesting) of the types that we do support.

I'm not sure where we need this, but think a stronger justification is needed to do this. The purpose of this package is partly to enforce standards across our projects, and one of those standards is proper type hinting. This PR significantly undermines that goal.

That notwithstanding, there are a few nits that are fixed by this PR and that's great, as is the expanded test coverage for how we handle (properly typed) collections.

For our own code, if one serializes for example a list[some_data_class] using a list or a list[Any] type hint, they won't get a list[some_data_class] when unmarshalling, a bug that they will need to fix asap (by providing the correct type hint). So not sure this PR undermines the stated goal re code that we produce. However, we sometimes need to store structured data in raw format for which we have no type hint, notably nested structures provided by 3rd parties without type hints. In that case, we just act as a pass trough.

(In this PR, support for this feature comes on top of proper testing + bug fixes also in this PR).

ericvergnaud avatar Apr 03 '25 18:04 ericvergnaud

This debate is philosophical, and I don't think a PR is the right place for having it. Even in Morpheus, written in a strongly typed language, with all the code under our control, there are 26 occurrences of asInstanceOf i.e. places where we're not able to accurately specify a type. So you think developers can always be explicit, but reality says otherwise, even in the above ideal situation. The current code crashes. This PR fixes that.

ericvergnaud avatar Apr 15 '25 11:04 ericvergnaud

Superseded by #226 & #241.

asnare avatar Jun 16 '25 16:06 asnare