Drop optimization
Created a new MultiVertexQuery drop API that efficiently drops many vertices.
Local benchmark execution:
Benchmark (isMultiDrop) (verticesAmount) Mode Cnt Score Error Units
MultiDropBenchmark.dropVertices true 5000 avgt 5 71.137 ± 2.007 ms/op
MultiDropBenchmark.dropVertices false 5000 avgt 5 900.047 ± 53.045 ms/op
Conclusion: locally deleting vertices via multiQuery performed >12 times faster than via Gremlin API.
I would prefer adding a simple unit test also which tests the new API
tx.multiQuery(vertices).drop()for better test coverage. Otherwise this PR looks good to me. After this PR is merged I think we could also overwrite TinkerPop'sDropStepto leverage this new optimization in Gremlin queries without the need to usingmultuQuerydirectly (i.e. similar as we did withJanusGraphHasStepas it is also afilterstep).
I'm not sure it will work the same way on individual DropStep, since the benefit of multi-query that it constructs a query once and applies to all vertices.
For individual DropStep, it constructs a query every time for each individual vertex to fetch relations (system + standard), and that takes so much time.
I would prefer adding a simple unit test also which tests the new API
tx.multiQuery(vertices).drop()for better test coverage. Otherwise this PR looks good to me. After this PR is merged I think we could also overwrite TinkerPop'sDropStepto leverage this new optimization in Gremlin queries without the need to usingmultuQuerydirectly (i.e. similar as we did withJanusGraphHasStepas it is also afilterstep).I'm not sure it will work the same way on individual
DropStep, since the benefit of multi-query that it constructs a query once and applies to all vertices. For individualDropStep, it constructs a query every time for each individual vertex to fetch relations (system + standard), and that takes so much time.
Yes, the default DropStep processes vertices one by one, but our multiquariable custom JanusGraph steps which overwrite TinkerPop steps extend the interface with additional methods to "pre-register vertices which will be processed later". Thus, the overwritten version of the DropStep will simply execute multiQuery for a batch of vertices and the vertices which are already processed will be skipped until a new batch is registered.
We use this technique for many custom steps (JanusGraphHasStep, JanusGraphPropertiesStep, JanusGraphVertexStep, etc.). After this PR is merged I can implement JanusGraphDropStep as it is quite easy to do if we already have the necessary multiQuery API and I already done that for multiple steps. So, I think it shouldn't be a problem unless I am missing something.