Inlining vertex properties into a CompositeIndex structure
Overview
Inlining vertex properties into a search index structure (CompositeIndex) can offer significant performance and efficiency benefits.
-
Performance Improvements Faster Querying: Inlining vertex properties directly within the index allows the search engine to retrieve all relevant data from the index itself. This means queries don’t need to make additional calls to data stores to fetch full vertex information, significantly reducing lookup time.
-
Data Locality In distributed storages, having inlined properties ensures that more complete data exists within individual partitions or shards. This reduces cross-node network calls and improves the overall query performance by ensuring data is more local to the request being processed.
-
Cost of Indexing vs. Storage Trade-off While inlining properties increases the size of the index (potentially leading to more extensive index storage requirements), it is often a worthwhile trade-off for performance, mainly when query speed is critical. This is a typical pattern in systems optimized for read-heavy workloads.
Usage
In order to take advantage of the inlined properties feature, JanusGraph Transaction should be set to use .propertyPrefetching(false)
Example:
//Build index
mgmt.buildIndex("composite", Vertex.class)
.addKey(idKey)
.addInlinePropertyKey(nameKey)
.buildCompositeIndex();
//Query
tx = graph.buildTransaction()
.propertyPrefetching(false) //this is important
.start();
tx.traversal().V().has("id", 100).next().value("name");
Thank you @ntisseyre for this great contribution! I left some general comments below. However, I have some questions / thoughts.
- What happens if the user uses a composite index with inlined properties but requests properties which are not covered by the inline composite index? Will we fetch only vertex id from the composite index or will we fetch those inline properties from the index as well?
- It would be really good to add documentation about this new functionality. I think somewhere between
Label ConstraintandComposite versus Mixed Indexestopics should be a good place for the doc, but I'm good with other places as well: https://github.com/JanusGraph/janusgraph/blob/master/docs/schema/index-management/index-performance.md#composite-versus-mixed-indexes- A benchmark test would be really beneficial to actually see some performance differences between inline indices and normal composite indices.
Thank you, @porunov, for the feedback!
- If the user is requesting properties that are not covered by composite index inlined keys, it will fall back to standard logic to fetch all properties for vertex from
edgeStoretable. - Will do;
- Will create a benchmark test
Benchmark test executed locally:
Benchmark (isInlined) (verticesAmount) Mode Cnt Score Error Units
CQLCompositeIndexInlinePropBenchmark.searchVertices true 5000 avgt 5 80.373 ± 14.164 ms/op
CQLCompositeIndexInlinePropBenchmark.searchVertices false 5000 avgt 5 1778.890 ± 296.721 ms/op
Results: Inline indices properties fetching is ~22 times faster for this test executed locally compared to normal properties fetching logic.
P.S. I didn't expect such a huge difference to be fair, but looks amazing :eyes:
Benchmark test: LGTM.
Added testIndexInlinePropertiesReindex test seems to fail for some index backends.
Benchmark test: LGTM. Added
testIndexInlinePropertiesReindextest seems to fail for some index backends.
When I run this test locally, it also returns unstable results: sometimes it succeeds, and sometimes it doesn't see values being inlined in the index structure. I suspect that the reIndexing/data propagation is not finished or that there is some cache, but I couldn't find where.
LGTM. Thank you for this amazing feature @ntisseyre !
@porunov, I have updated the documentation in one of the commits. Please take a look at the Important Notes on Compatibility section and let me know what you think and how such changes should generally be approached during the release cycle.
LGTM. Thank you for this amazing feature @ntisseyre !
@porunov, I have updated the documentation in one of the commits. Please take a look at the
Important Notes on Compatibilitysection and let me know what you think and how such changes should generally be approached during the release cycle.
We actually describe all upgrade instructions and breaking changes in docs/changelog.md in the release the feature lands into. As your contributed feature will land into 1.1.0 version, could you please add something similar to the following after Assets under release 1.1.0 (i.e. right before release 1.0.1 starts)?:
#### Upgrade Instructions
##### Inlining vertex properties into a Composite Index
Inlining vertex properties into a Composite Index structure can offer significant performance and efficiency benefits.
See [documentation](./schema/index-management/index-performance.md#inlining-vertex-properties-into-a-composite-index) on how to inline vertex properties into a composite index.
**Important Notes on Compatibility**
1. **Backward Incompatibility**
Once a JanusGraph instance adopts this new schema feature, it cannot be rolled back to a prior version of JanusGraph.
The changes in the schema structure are not compatible with earlier versions of the system.
2. **Migration Considerations**
It is critical that users carefully plan their migration to this new version, as there is no automated or manual rollback process
to revert to an older version of JanusGraph once this feature is used.
In that case you can remove Important Notes on Compatibility from the schema/index-management/index-performance.md because usually it's irrelevant to the new users and previous users should always look into upgrade instructions.
@ntisseyre do you want to merge it or not ready yet? Just asking because I wanted to rebase #4695 after you merge this PR to add inline index functionality to the JSON schema importer.
@ntisseyre do you want to merge it or not ready yet? Just asking because I wanted to rebase #4695 after you merge this PR to add inline index functionality to the JSON schema importer.
I was waiting to see if anyone had any more comments, but I'm good to merge it.
Sorry I just saw this. This is awesome!!!! I didn't take a look at implementation detail, but we are essentially implementing covering index, right?
Sorry I just saw this. This is awesome!!!! I didn't take a look at implementation detail, but we are essentially implementing covering index, right?
That's right, it is like MySQL covering index