Wrong diff in case of removing a non last index of an array
Suppose we have an array
old = ['a','b','c']
new = ['b','c']
# In this case I figured from the documentation that you should expect
# the diff to be ('remove','',[(0,'a')])
# but instead I'm getting:
[('change', [0], ('a', 'b')), ('change', [1], ('b', 'c')), ('remove', '', [(2, 'c')])]
Is that an error or did I misinterpret the usage of the library?
@Glignos the current algorithm for comparing lists is very simple. In your case you would get a better result if you use sets instead of lists.
You can propose a change of following lines https://github.com/inveniosoftware/dictdiffer/blob/efc1b73742481b57a12833a2b0dfb3b32262c304/dictdiffer/init.py#L159-L161
I would propose to use difflib.SequenceMatcher that could do the job quite well.
I'm having the same issue, except with more complex data (dictionaries nested in dictionaries). I obviously have no idea what kind of fix this would be, but would love to see the functionality!
Hey everyone! I found this thread when looking for a solution to calculate diffs between JSON documents. My schema contains a large list of objects at the top of the tree, so the subject issue is a deal breaker. I've prototyped my solution based on some of dictdiffer's code and on difflib.SequenceMatcher, as proposed by @jirikuncar.
The main difficulty is in patch: I store all indices as they appear in the source array, and as I perform the patching I need to keep track of accumulating offset (e.g. if I delete a range, all subsequent indices need to be reduced by the length of deleted segment). Also, I don't need to invert diffs, so I didn't implement this functionality.
Here's the link to my code and here's a little demo. My code uses a different format (structured dicts instead of tuples) but the idea is the same:
a
['a', 'b', 'c', 'y', 'e', 'f', 'x']
b
['b', 'c', 'x', 'e', 'f']
diff
{'path': [], 'action': 'remove_range', 'start': 0, 'length': 1}
{'path': [3], 'action': 'change', 'new': 'x'}
{'path': [], 'action': 'remove_range', 'start': 6, 'length': 1}
Works with documents as well
a
['a', 'b', 'c', {'nested': [{'value': 'y', 'other': 123}]}, 'e', 'f']
b
['extra', 'a', 'b', 'c', {'nested': [{'value': 'x'}]}, 'e', 'f']
diff
{'path': [], 'action': 'add_range', 'start': 0, 'values': ['extra']}
{'path': [3, 'nested', 0, 'value'], 'action': 'change', 'new': 'x'}
{'path': [3, 'nested', 0], 'action': 'remove', 'keys': ['other']}
One difference is that my diffs are not reversible (but I don't think it'll be too hard to add this in case I need it).