GraphScope icon indicating copy to clipboard operation
GraphScope copied to clipboard

[BUG] Some basic gremlin queries are slow.

Open siyuan0322 opened this issue 3 years ago • 2 comments

Describe the bug Some gremlin queries are very slow, which are supposed to be very fast.

  • [x] g.V().limit(1) # Sometimes 1 min, sometimes < 1s.
  • [ ] g.V().count() # timeout after 10 min, 900 million vertices
  • [ ] g.E().count() # cost 56s, 30 million edges
  • [x] g.E().hasLabel(%s).has('id', %s) # slow, seems not using pk.

Endpoints from E results may not correct, i.e. there's not a vertex associated with it.

  • [x] g.V(%s).bothE() -> [eid][srcid -> label -> dstid], then g.V(srcid/dstid) may be empty.

siyuan0322 avatar Jul 19 '22 07:07 siyuan0322

limit seems has been fixed by some recent commits; count may need to refactor the GlobalGraphQuery implementation in store.

siyuan0322 avatar Jul 22 '22 03:07 siyuan0322

count may need:

  1. groot: maintain the info of number of vertices, number of edges in store, rather than count each time;
  2. ir-core: fuse source and count in ir-plan;
  3. ir-runtime: support query for fusion of source+count
  4. ReadGraph interface: provide scan_count() related apis, and impl for different storages.

BingqingLyu avatar Jul 26 '22 05:07 BingqingLyu

Fixed by a flurry of PRs. Waiting to be confirmed by real life workloads.

siyuan0322 avatar Nov 06 '23 09:11 siyuan0322