Major performance regression caused by unevaluatedProperties
Hi,
I've seen an internal customer project being on an ancient version (0.1.3) and wanted to upgrade things. After the update to 1.0.69 (latest at the point of writing) things still seem to work, but show a major performance regression.
A little microbenchmark that validates a typical json for us shows the following numbers: 1.0.69
Benchmark Mode Cnt Score Error Units
MyBenchmark.testValidate avgt 10 46884,301 ± 764,721 ns/op
1.0.67
Benchmark Mode Cnt Score Error Units
MyBenchmark.testValidate avgt 10 26214,556 ± 425,732 ns/op
0.1.3
Benchmark Mode Cnt Score Error Units
MyBenchmark.testValidate avgt 10 14705,189 ± 986,189 ns/op
The regression between 1.0.67 and 1.0.69 can be largely traced back to the work on unevaluated properties in #534 & #544 , I think. Ideally, when the schema doesn't contain this keyword - like in our case - this overhead should be avoided.
The regression between 0.1.3 and 1.0.67 is probably caused by numerous things, but one bigger portion is likely driven by the CollectorContext infrastructure and its repeated Map.get calls. This is also why the unevaluated properties is so expensive I think.
The purple highlights in the following (reversed) flamegraph of 1.0.69 show the CollectorContext being one of the main drivers.

For 1.0.67 it shows the following - less but still a significant time is spent on the CollectorContext.

Cheers, Christoph
Just out of curiousity I fired our example against Everit 1.14.1 (latest at the time of writing) vs. Networknt (1.0.67).
Benchmark Mode Cnt Score Error Units
MyBenchmark.testValidateEverit avgt 10 17213,067 ± 2434,057 ns/op
MyBenchmark.testValidateNetworkNt avgt 10 24811,879 ± 2222,995 ns/op
At least for our very simple example, the regressions apparently put this library into a position where it's not the fastest one anymore and invalidate the 3 year old claim in the README.
It is the fastest Java JSON Schema Validator as far as I know
With 1.0.69 even by quite a margin.
Don't get me wrong. I think Networknt is still great and its API is more intuitive, but I would appreciate if you could spent a few dev-cycles on the validation performance to reclaim the # 1 position.
@dreis2211 I have rerun the performance test and successfully reproduced what you have observed. Thanks a lot for raising it and I will try to get it fixed.
Hi @dreis2211 thank you for bringing this up I will take a look into this. I think we can avoid multiple calls to the CollectorContext by a simple check like hasEvaluatedProperties, this check is similar to hasRequiredProperties method in JsonSchema.java.
@stevehu is there anyway I can run the performance test setup you already have. I want to check the perf tests after I have fixed this issue.
Thanks in advance.
As shown above the unevaluatedProperties optimization only reduces things, but networknt has not the fastest schema validator anymore even if this is fixed. Feel free to repurpose this ticket to apply some more general performance work or create a new ticket for further work and keep this focussed at the unevaluated properties stuff.
@prashanthjos I guess @stevehu just ran https://github.com/networknt/json-schema-validator-perftest . Although it's not a perfect benchmark it shows the regressions as well.
we have a performance test repo that I am currently working on to upgrade to the latest and hooked with JProfiler. Please be aware that the test case was focused on V4 only as other libraries won't support the latest version of the specifications.
I am going to remove the Fge as it is not active anymore and the developer has given up the repo. Also, I will upgrade the test to V7 as the everit supports up to V7.
Once we finish this upgrade, we might add other active libraries.
https://github.com/networknt/json-schema-validator-perftest
I have updated the perftest repo and rerun the tests in both IDE and java -jar. The following is the link to the test result.
https://github.com/networknt/json-schema-validator-perftest#results-2022-05-02
I have no idea why the networknt validator is twice as fast as running with java -jar.
I am going to remove the Fge as it is not active anymore and the developer has given up the repo. Also, I will upgrade the test to V7 as the everit supports up to V7.
Hopefully it does not sound too blunt, but what is the purpose of comparing performance for v7? There are 2 drafts after that, which puts (imo) v7 in the realm of legacy. I understand advertisement, I guess, but if that is the reason I would focus on supporting draft2020 and be the first one in java (might be a weaker claim, since other libs can catch up, but for people like me who look at the latest it could mean I know where to look when the next json schema drafts come out)
@dreis2211 Please try 1.0.81 and let us know how we are doing.