GraphScope
GraphScope copied to clipboard
[BUG] Some basic gremlin queries are slow.
Describe the bug Some gremlin queries are very slow, which are supposed to be very fast.
- [x] g.V().limit(1) # Sometimes 1 min, sometimes < 1s.
- [ ] g.V().count() # timeout after 10 min, 900 million vertices
- [ ] g.E().count() # cost 56s, 30 million edges
- [x] g.E().hasLabel(%s).has('id', %s) # slow, seems not using pk.
Endpoints from E results may not correct, i.e. there's not a vertex associated with it.
- [x] g.V(%s).bothE() -> [eid][srcid -> label -> dstid], then g.V(srcid/dstid) may be empty.
limit seems has been fixed by some recent commits;
count may need to refactor the GlobalGraphQuery implementation in store.
count may need:
-
groot: maintain the info of number of vertices, number of edges in store, rather than count each time; -
ir-core: fusesourceandcountin ir-plan; -
ir-runtime: support query for fusion of source+count -
ReadGraphinterface: provide scan_count() related apis, and impl for different storages.
Fixed by a flurry of PRs. Waiting to be confirmed by real life workloads.