Indexes not complete
I experience incomplete mixed indexes. I tried tracing down the cause and it looks like it only occurs when performing threaded bulk transactions. While the transactions are completed successful at once or after retries, it seems the indexing is not always successful (although I do not get any errors, the vertex never indexed). The indexes are created upfront and then I import data. I verified the completeness of the index with the following queries:
gremlin> g.V().has("startDate").count()
WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [()]. For better performance, use indexes
==>2593
gremlin> g.V().has("startDate", P.lte(new java.util.Date())).count()
==>1398
gremlin> g.V().has("startDate", P.gt(new java.util.Date())).count()
==>0
I tested this with a build which was a few weeks old (+Cassandra 3.x and ES 5.5.2) and today with the latest master (+Cassandra 3.x and ES 5.6.2)
The commit-result of a transaction is clearly not depending on whether the indexing has succeeded, so how can I detect/handle failed indexing transactions?
I read #483 which mentions mixed indexing on ES is much slower than on non-mixed indexes. I suspect the indexing on ES is not able to keep up with the number of commits to JanusGraph and so transactions are dropped?
There is a refresh interval involved. Are you issuing those queries immediately after loading the data?
@pluradj no, just manually checking them. I did some more research and it looks like I forgot to enforce uniqueness on an id-property for which I created a composite index with .unique(). But as it turned out this was causing duplicates while inserting/committing vertices concurrently. So I first tried inserting the data non-concurrently (but still in batches) and then the indexing was complete and no duplicates were present. So I added mgmt.setConsistency(index, ConsistencyModifier.LOCK) when creating the index and returned back to multi-threaded transactions and now the index was also complete.
A unique-constraint only ensures consistency within a single transaction space, right?
So having a unique-constraint on an index without explicitly setting consistency (for Cassandra) results in a concurrent transaction space in multiple vertices having the same id-property. Strangely however, the indexation of other properties is stalling on arbitrary moments. How so? If I manually do a REINDEX job, the index completes as expected.
Could be related to #281 if some index mutations are failing but the tx commit still succeeds.
I have same problem. But ConsistencyModifier.LOCK is too slow.
When so many unique vertexes are inserted concurrently (by threads or processes), it seems the elasticsearch mixed-index is incomplete.
Currently my workaround is to use only one thread for one vertex label.
in-memory backend also not visible in different threads , is there any configuation on it .my step is quite easy
- at very begin create 5 index.for
idlabelcode[id,label] [code,label] - add data in single thread and everything goes well
- new data coming delete the exists data with drop all V and E and commit tx
- add new data as step 2
- do reindex with management
then certain case ..search .got unexpect result 1 e.g has(id, 'xxx') -> V[1] has(code, "xxx")-> V[2] which suppose the same 2 search with id is ok but when search with id, label won't get result the index type is composite...