Index by label
In Janus I can create a composite index for a single property. Why then can I not create a composite index for a vertex label without a property?
My use case is that we're using vertex labels to identify the type of the vertex (as suggested in Janus docs). We have a CRUD UI that allows users to browse objects of a given type, so we now want to show a paged list of people:
g.V().hasLabel("person").range(0, 10)
This results in a 5 second long query, because it's doing a full graph scan. If I introduce a redundant property "mytype"="person", I can place an index on "mytype" only for label "person", write a query like this:
g.V().has(Key[String]("mytype"), "person").hasLabel("person").range(0, 10)
And it no longer performs a full graph scan: ~100ms.
Is it possible to trick Janus into using an existing index I have on property "name" for label "person" by changing the query? Both these queries still result in a full graph scan:
g.V().has(Key[String]("name")).hasLabel("person").range(0, 10)
g.V().has(Key[String]("mytype")).hasLabel("person").range(0, 10)
This proposed feature has the potential to introduce a hot key in graphindex for large graphs that have many vertices of a particular type. To implement this feature correctly, graphindex would also need to be partitioned (with the number of partitions defined by cluster.max-partitions). Otherwise, you will see latency increases in Cassandra/HBase and throttling on DynamoDB while using a vertex label index.
@amcp The hot key problem already exists in plenty of schemas. Right now we use edges or property indices for types and other high-cardinality indices, and have to wade through all the problems that arise. I would love to see this issue get worked on, if only to spur better support for hot keys.
Does anyone have a possible resolution or hack to solve performance issue of g.V().hasLabel("person").range(0, 10) now ?
Just want to share what we have been doing: we created an additional property and called it "type", which stores the vertex label. We then created a composite index for it. This has been running well for years until the number of nodes under some hot labels grew too large and caused the super node problem. We recently demised the composite index and used a mixed index instead. The performance does drop a little, but still at an acceptable level. So far, so good.
My feeling is that JanusGraph should provide an option to let users decide if they want to index their labels, and if so, whether they want a composite index or a mixed index.
Is there any work happening on this issue as we are hitting road blocker since we have very few labels across millions of vertices. My setup includes Janusgraph with HBase backend. Unable to create edge since query for creating edge based on matches seems to be running forever.
After finally getting my test data loaded into JanusGraph, imagine my surprise when g.V().hasLabel("mylabel").count() took ~10 seconds on a database of only 407,839 nodes :(