json-diff icon indicating copy to clipboard operation
json-diff copied to clipboard

feat: optimize json array partial matcher for big json file

Open deblockt opened this issue 1 year ago • 2 comments

LenientJsonArrayPartialMatcher performs a comparison of each element in the expected array node with each element in the actual array node, resulting in n^2 complexity for calculating the similarity score before identifying the best matching pairs for comparison. Here is the code link.

I identified an opportunity for optimization in this process. By filtering out identical elements code link from both the expected and actual arrays before applying the n^2 similarity score calculation, we can significantly reduce the complexity. In scenarios where there are only a few mismatches between the expected and actual arrays, this optimization ensures that the n^2 complexity is only applied to those differing elements. The worst-case scenario of n^2 for all elements in the array occurs only if none of the elements match.

For smaller JSONs, the implementation may not exhibit a noticeable difference. However, during testing with larger JSONs containing hundreds of array elements, a significant performance improvement becomes apparent. I have tested this enhancement and created a pull request. I would appreciate it if you could review the pull request and share your thoughts!

The base of this MR come from @aymar99 and the PR #18.

deblockt avatar Apr 21 '24 12:04 deblockt

It's nice !

I've a question: Is the LenientJsonArrayPartialMatcher is 'strict' for value comparaison but taking any order (vs the StrictJsonArrayPartialMatcher) ?

tokazio avatar May 19 '24 00:05 tokazio

It's nice !

I've a question: Is the LenientJsonArrayPartialMatcher is 'strict' for value comparaison but taking any order (vs the StrictJsonArrayPartialMatcher) ?

The LenientJsonArrayPartialMatcher use global configuration to compare each array items.

In this example:

final var jsonMatcher = new CompositeJsonMatcher(
    new LenientJsonArrayPartialMatcher(), // comparing array using lenient mode (ignore array order and extra items)
    new LenientJsonObjectPartialMatcher(), // comparing object using lenient mode (ignoring extra properties)
    new LenientNumberPrimitivePartialMatcher(new StrictPrimitivePartialMatcher()) // comparing primitive types and manage numbers (100.00 == 100)
);

array items are compared using LenientJsonObjectPartialMatcher so value are compared using a 'lenient' comparator.

In this example:

final var jsonMatcher = new CompositeJsonMatcher(
    new LenientJsonArrayPartialMatcher(), // comparing array using lenient mode (ignore array order and extra items)
    new StrictJsonObjectPartialMatcher(), // comparing object using strict mode
    new LenientNumberPrimitivePartialMatcher(new StrictPrimitivePartialMatcher()) // comparing primitive types and manage numbers (100.00 == 100)
);

array items are compared using StrictJsonObjectPartialMatcher so value are compared using a 'strict' comparator.

deblockt avatar May 20 '24 14:05 deblockt