Support for diffing and patching data holder objects
I would need to go beyond just dict/set/list data structures in my diffing, i.e. diffing
other objects as well. Main target for this would be to support data holder objects:
types.SimpleNamespace, dataclasses, pydantic models.
This would be an incremental change, not changing anything about how diffing and patching works for dicts/lists/sets today.
If you want to jump into code, I have a first tested implementation available here.
Key parts of the proposed change:
1. diff and patch objects
Treat objects as if they were the dict represented by their __dict__ value
first = SimpleNamespace(a=1)
second = SimpleNamespace(a=2)
delta = diff(first, second)
assert patch(delta, first).a == 2
Here I decided to include the __dict__ value in the path in the change:
first = SimpleNamespace(a=1)
second = SimpleNamespace(a=2)
changes = list(diff(first, second))
assert changes == [('change', ['__dict__', 'a'], (1, 2))]
Alternative would be to not include the special value, essentially treating the object transparently as a dict, but I opted for being explicit here.
2. Represent objects in diffs
Today, an object like decimal.Decimal is included in the diff information as the object itself.
This does not work if we want to store the diffs in a file or a database, and get them back later.
Looking at two types of objects:
Specific Python objects
E.g. converting Decimal('1.23') to '1.23' when diffing and back to Decimal('1.23') when patching.
Here I implemented support for:
-
datetime.datetime -
datetime.date -
datetime.time -
datetime.timedelta -
decimal.Decimal -
enum.Enum -
pathlib.Path -
uuid.UUID
(For now, enums are just converted to the value - Color.RED converted to 2, for example - but
not back.)
I added a method for a dev to add rules for other roundtrip transformations as required.
Generic objects
Converting an object with __dict__ to just the __dict__ value when diffing and back to the
original object when patching.
To handle the security concern with creating objects when your diff data might come from an untrusted source, I am putting in an "allow list" which restricts the modules that the class be inherited from.
"Serialized" format
Objects are included in diffs as dicts with a special key. For example:
first = []
second = [SimpleNamespace(a=1)]
changes = list(diff(first, second))
assert changes == [('add', '', [(0, {
VALUE_KEY: {'module': 'types', 'name': 'SimpleNamespace', 'value': {'a': 1}}}
)])]
... where the VALUE_KEY is '_dictdiffer_value_key'.
My questions
- Would the maintainers be willing to merge something like described here?
- Any improvement ideas on the design?