json-schema-validator icon indicating copy to clipboard operation
json-schema-validator copied to clipboard

Major performance regression caused by unevaluatedProperties

Open dreis2211 opened this issue 4 years ago • 8 comments

Hi,

I've seen an internal customer project being on an ancient version (0.1.3) and wanted to upgrade things. After the update to 1.0.69 (latest at the point of writing) things still seem to work, but show a major performance regression.

A little microbenchmark that validates a typical json for us shows the following numbers: 1.0.69

Benchmark                         Mode  Cnt      Score     Error  Units
MyBenchmark.testValidate          avgt   10  46884,301 ± 764,721  ns/op

1.0.67

Benchmark                         Mode  Cnt      Score     Error  Units
MyBenchmark.testValidate          avgt   10  26214,556 ± 425,732  ns/op

0.1.3

Benchmark                         Mode  Cnt      Score     Error  Units
MyBenchmark.testValidate         avgt   10  14705,189 ± 986,189  ns/op

The regression between 1.0.67 and 1.0.69 can be largely traced back to the work on unevaluated properties in #534 & #544 , I think. Ideally, when the schema doesn't contain this keyword - like in our case - this overhead should be avoided.

The regression between 0.1.3 and 1.0.67 is probably caused by numerous things, but one bigger portion is likely driven by the CollectorContext infrastructure and its repeated Map.get calls. This is also why the unevaluated properties is so expensive I think.

The purple highlights in the following (reversed) flamegraph of 1.0.69 show the CollectorContext being one of the main drivers. image

For 1.0.67 it shows the following - less but still a significant time is spent on the CollectorContext. image

Cheers, Christoph

dreis2211 avatar Apr 29 '22 07:04 dreis2211

Just out of curiousity I fired our example against Everit 1.14.1 (latest at the time of writing) vs. Networknt (1.0.67).

Benchmark                                 Mode  Cnt      Score      Error  Units
MyBenchmark.testValidateEverit            avgt   10  17213,067 ± 2434,057  ns/op
MyBenchmark.testValidateNetworkNt         avgt   10  24811,879 ± 2222,995  ns/op

At least for our very simple example, the regressions apparently put this library into a position where it's not the fastest one anymore and invalidate the 3 year old claim in the README.

It is the fastest Java JSON Schema Validator as far as I know

With 1.0.69 even by quite a margin.

Don't get me wrong. I think Networknt is still great and its API is more intuitive, but I would appreciate if you could spent a few dev-cycles on the validation performance to reclaim the # 1 position.

dreis2211 avatar Apr 29 '22 10:04 dreis2211

@dreis2211 I have rerun the performance test and successfully reproduced what you have observed. Thanks a lot for raising it and I will try to get it fixed.

stevehu avatar Apr 30 '22 01:04 stevehu

Hi @dreis2211 thank you for bringing this up I will take a look into this. I think we can avoid multiple calls to the CollectorContext by a simple check like hasEvaluatedProperties, this check is similar to hasRequiredProperties method in JsonSchema.java.

@stevehu is there anyway I can run the performance test setup you already have. I want to check the perf tests after I have fixed this issue.

Thanks in advance.

prashanthjos avatar May 02 '22 11:05 prashanthjos

As shown above the unevaluatedProperties optimization only reduces things, but networknt has not the fastest schema validator anymore even if this is fixed. Feel free to repurpose this ticket to apply some more general performance work or create a new ticket for further work and keep this focussed at the unevaluated properties stuff.

dreis2211 avatar May 02 '22 11:05 dreis2211

@prashanthjos I guess @stevehu just ran https://github.com/networknt/json-schema-validator-perftest . Although it's not a perfect benchmark it shows the regressions as well.

dreis2211 avatar May 02 '22 12:05 dreis2211

we have a performance test repo that I am currently working on to upgrade to the latest and hooked with JProfiler. Please be aware that the test case was focused on V4 only as other libraries won't support the latest version of the specifications.

I am going to remove the Fge as it is not active anymore and the developer has given up the repo. Also, I will upgrade the test to V7 as the everit supports up to V7.

Once we finish this upgrade, we might add other active libraries.

https://github.com/networknt/json-schema-validator-perftest

stevehu avatar May 02 '22 13:05 stevehu

I have updated the perftest repo and rerun the tests in both IDE and java -jar. The following is the link to the test result.

https://github.com/networknt/json-schema-validator-perftest#results-2022-05-02

I have no idea why the networknt validator is twice as fast as running with java -jar.

stevehu avatar May 03 '22 23:05 stevehu

I am going to remove the Fge as it is not active anymore and the developer has given up the repo. Also, I will upgrade the test to V7 as the everit supports up to V7.

Hopefully it does not sound too blunt, but what is the purpose of comparing performance for v7? There are 2 drafts after that, which puts (imo) v7 in the realm of legacy. I understand advertisement, I guess, but if that is the reason I would focus on supporting draft2020 and be the first one in java (might be a weaker claim, since other libs can catch up, but for people like me who look at the latest it could mean I know where to look when the next json schema drafts come out)

ThanksForAllTheFish avatar May 27 '22 06:05 ThanksForAllTheFish

@dreis2211 Please try 1.0.81 and let us know how we are doing.

fdutton avatar Apr 30 '23 16:04 fdutton