janusgraph icon indicating copy to clipboard operation
janusgraph copied to clipboard

Inlining vertex properties into a CompositeIndex structure

Open ntisseyre opened this issue 1 year ago • 6 comments

Overview

Inlining vertex properties into a search index structure (CompositeIndex) can offer significant performance and efficiency benefits.

  1. Performance Improvements Faster Querying: Inlining vertex properties directly within the index allows the search engine to retrieve all relevant data from the index itself. This means queries don’t need to make additional calls to data stores to fetch full vertex information, significantly reducing lookup time.

  2. Data Locality In distributed storages, having inlined properties ensures that more complete data exists within individual partitions or shards. This reduces cross-node network calls and improves the overall query performance by ensuring data is more local to the request being processed.

  3. Cost of Indexing vs. Storage Trade-off While inlining properties increases the size of the index (potentially leading to more extensive index storage requirements), it is often a worthwhile trade-off for performance, mainly when query speed is critical. This is a typical pattern in systems optimized for read-heavy workloads.

Usage

In order to take advantage of the inlined properties feature, JanusGraph Transaction should be set to use .propertyPrefetching(false)

Example:

//Build index
 mgmt.buildIndex("composite", Vertex.class)
            .addKey(idKey)
            .addInlinePropertyKey(nameKey)
            .buildCompositeIndex();

//Query
tx = graph.buildTransaction()
            .propertyPrefetching(false) //this is important
            .start();
tx.traversal().V().has("id", 100).next().value("name");

ntisseyre avatar Oct 10 '24 19:10 ntisseyre

Thank you @ntisseyre for this great contribution! I left some general comments below. However, I have some questions / thoughts.

  1. What happens if the user uses a composite index with inlined properties but requests properties which are not covered by the inline composite index? Will we fetch only vertex id from the composite index or will we fetch those inline properties from the index as well?
  2. It would be really good to add documentation about this new functionality. I think somewhere between Label Constraint and Composite versus Mixed Indexes topics should be a good place for the doc, but I'm good with other places as well: https://github.com/JanusGraph/janusgraph/blob/master/docs/schema/index-management/index-performance.md#composite-versus-mixed-indexes
  3. A benchmark test would be really beneficial to actually see some performance differences between inline indices and normal composite indices.

Thank you, @porunov, for the feedback!

  1. If the user is requesting properties that are not covered by composite index inlined keys, it will fall back to standard logic to fetch all properties for vertex from edgeStore table.
  2. Will do;
  3. Will create a benchmark test

ntisseyre avatar Oct 16 '24 23:10 ntisseyre

Benchmark test executed locally:

Benchmark                                            (isInlined)  (verticesAmount)  Mode  Cnt     Score     Error  Units
CQLCompositeIndexInlinePropBenchmark.searchVertices         true              5000  avgt    5    80.373 ±  14.164  ms/op
CQLCompositeIndexInlinePropBenchmark.searchVertices        false              5000  avgt    5  1778.890 ± 296.721  ms/op

Results: Inline indices properties fetching is ~22 times faster for this test executed locally compared to normal properties fetching logic.

P.S. I didn't expect such a huge difference to be fair, but looks amazing :eyes:

porunov avatar Oct 17 '24 13:10 porunov

Benchmark test: LGTM. Added testIndexInlinePropertiesReindex test seems to fail for some index backends.

porunov avatar Oct 17 '24 13:10 porunov

Benchmark test: LGTM. Added testIndexInlinePropertiesReindex test seems to fail for some index backends.

When I run this test locally, it also returns unstable results: sometimes it succeeds, and sometimes it doesn't see values being inlined in the index structure. I suspect that the reIndexing/data propagation is not finished or that there is some cache, but I couldn't find where.

ntisseyre avatar Oct 17 '24 13:10 ntisseyre

LGTM. Thank you for this amazing feature @ntisseyre !

@porunov, I have updated the documentation in one of the commits. Please take a look at the Important Notes on Compatibility section and let me know what you think and how such changes should generally be approached during the release cycle.

ntisseyre avatar Oct 18 '24 15:10 ntisseyre

LGTM. Thank you for this amazing feature @ntisseyre !

@porunov, I have updated the documentation in one of the commits. Please take a look at the Important Notes on Compatibility section and let me know what you think and how such changes should generally be approached during the release cycle.

We actually describe all upgrade instructions and breaking changes in docs/changelog.md in the release the feature lands into. As your contributed feature will land into 1.1.0 version, could you please add something similar to the following after Assets under release 1.1.0 (i.e. right before release 1.0.1 starts)?:

#### Upgrade Instructions

##### Inlining vertex properties into a Composite Index

Inlining vertex properties into a Composite Index structure can offer significant performance and efficiency benefits. 
See [documentation](./schema/index-management/index-performance.md#inlining-vertex-properties-into-a-composite-index) on how to inline vertex properties into a composite index.

**Important Notes on Compatibility**

1. **Backward Incompatibility**
Once a JanusGraph instance adopts this new schema feature, it cannot be rolled back to a prior version of JanusGraph. 
The changes in the schema structure are not compatible with earlier versions of the system.

2. **Migration Considerations**
It is critical that users carefully plan their migration to this new version, as there is no automated or manual rollback process 
to revert to an older version of JanusGraph once this feature is used.

In that case you can remove Important Notes on Compatibility from the schema/index-management/index-performance.md because usually it's irrelevant to the new users and previous users should always look into upgrade instructions.

porunov avatar Oct 18 '24 16:10 porunov

@ntisseyre do you want to merge it or not ready yet? Just asking because I wanted to rebase #4695 after you merge this PR to add inline index functionality to the JSON schema importer.

porunov avatar Oct 27 '24 14:10 porunov

@ntisseyre do you want to merge it or not ready yet? Just asking because I wanted to rebase #4695 after you merge this PR to add inline index functionality to the JSON schema importer.

I was waiting to see if anyone had any more comments, but I'm good to merge it.

ntisseyre avatar Oct 27 '24 16:10 ntisseyre

Sorry I just saw this. This is awesome!!!! I didn't take a look at implementation detail, but we are essentially implementing covering index, right?

li-boxuan avatar Nov 22 '24 07:11 li-boxuan

Sorry I just saw this. This is awesome!!!! I didn't take a look at implementation detail, but we are essentially implementing covering index, right?

That's right, it is like MySQL covering index

ntisseyre avatar Nov 22 '24 22:11 ntisseyre