janusgraph icon indicating copy to clipboard operation
janusgraph copied to clipboard

Index by label

Open s-nel opened this issue 8 years ago • 6 comments

In Janus I can create a composite index for a single property. Why then can I not create a composite index for a vertex label without a property?

My use case is that we're using vertex labels to identify the type of the vertex (as suggested in Janus docs). We have a CRUD UI that allows users to browse objects of a given type, so we now want to show a paged list of people:

g.V().hasLabel("person").range(0, 10)

This results in a 5 second long query, because it's doing a full graph scan. If I introduce a redundant property "mytype"="person", I can place an index on "mytype" only for label "person", write a query like this:

g.V().has(Key[String]("mytype"), "person").hasLabel("person").range(0, 10)

And it no longer performs a full graph scan: ~100ms.

Is it possible to trick Janus into using an existing index I have on property "name" for label "person" by changing the query? Both these queries still result in a full graph scan:

g.V().has(Key[String]("name")).hasLabel("person").range(0, 10)
g.V().has(Key[String]("mytype")).hasLabel("person").range(0, 10)

s-nel avatar May 26 '17 03:05 s-nel

This proposed feature has the potential to introduce a hot key in graphindex for large graphs that have many vertices of a particular type. To implement this feature correctly, graphindex would also need to be partitioned (with the number of partitions defined by cluster.max-partitions). Otherwise, you will see latency increases in Cassandra/HBase and throttling on DynamoDB while using a vertex label index.

amcp avatar May 26 '17 15:05 amcp

@amcp The hot key problem already exists in plenty of schemas. Right now we use edges or property indices for types and other high-cardinality indices, and have to wade through all the problems that arise. I would love to see this issue get worked on, if only to spur better support for hot keys.

sharpau avatar Jun 20 '17 19:06 sharpau

Does anyone have a possible resolution or hack to solve performance issue of g.V().hasLabel("person").range(0, 10) now ?

rrmerugu avatar May 06 '21 12:05 rrmerugu

Just want to share what we have been doing: we created an additional property and called it "type", which stores the vertex label. We then created a composite index for it. This has been running well for years until the number of nodes under some hot labels grew too large and caused the super node problem. We recently demised the composite index and used a mixed index instead. The performance does drop a little, but still at an acceptable level. So far, so good.

li-boxuan avatar May 06 '21 13:05 li-boxuan

My feeling is that JanusGraph should provide an option to let users decide if they want to index their labels, and if so, whether they want a composite index or a mixed index.

li-boxuan avatar Jan 15 '22 02:01 li-boxuan

Is there any work happening on this issue as we are hitting road blocker since we have very few labels across millions of vertices. My setup includes Janusgraph with HBase backend. Unable to create edge since query for creating edge based on matches seems to be running forever.

nikita15p avatar Sep 12 '22 10:09 nikita15p

After finally getting my test data loaded into JanusGraph, imagine my surprise when g.V().hasLabel("mylabel").count() took ~10 seconds on a database of only 407,839 nodes :(

baughmann avatar Nov 15 '22 00:11 baughmann