janusgraph Indexes not complete

I experience incomplete mixed indexes. I tried tracing down the cause and it looks like it only occurs when performing threaded bulk transactions. While the transactions are completed successful at once or after retries, it seems the indexing is not always successful (although I do not get any errors, the vertex never indexed). The indexes are created upfront and then I import data. I verified the completeness of the index with the following queries:

gremlin> g.V().has("startDate").count()
WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
==>2593
gremlin> g.V().has("startDate", P.lte(new java.util.Date())).count()
==>1398
gremlin> g.V().has("startDate", P.gt(new java.util.Date())).count()
==>0

I tested this with a build which was a few weeks old (+Cassandra 3.x and ES 5.5.2) and today with the latest master (+Cassandra 3.x and ES 5.6.2)

The commit-result of a transaction is clearly not depending on whether the indexing has succeeded, so how can I detect/handle failed indexing transactions?

I read #483 which mentions mixed indexing on ES is much slower than on non-mixed indexes. I suspect the indexing on ES is not able to keep up with the number of commits to JanusGraph and so transactions are dropped?

Oct 04 '17 15:10 ThijsBroersen

There is a refresh interval involved. Are you issuing those queries immediately after loading the data?

Oct 04 '17 16:10 pluradj

@pluradj no, just manually checking them. I did some more research and it looks like I forgot to enforce uniqueness on an id-property for which I created a composite index with .unique(). But as it turned out this was causing duplicates while inserting/committing vertices concurrently. So I first tried inserting the data non-concurrently (but still in batches) and then the indexing was complete and no duplicates were present. So I added mgmt.setConsistency(index, ConsistencyModifier.LOCK) when creating the index and returned back to multi-threaded transactions and now the index was also complete.

A unique-constraint only ensures consistency within a single transaction space, right?

So having a unique-constraint on an index without explicitly setting consistency (for Cassandra) results in a concurrent transaction space in multiple vertices having the same id-property. Strangely however, the indexation of other properties is stalling on arbitrary moments. How so? If I manually do a REINDEX job, the index completes as expected.

Oct 04 '17 19:10 ThijsBroersen

Could be related to #281 if some index mutations are failing but the tx commit still succeeds.

Dec 05 '17 21:12 sharpau

I have same problem. But ConsistencyModifier.LOCK is too slow.

Dec 05 '17 23:12 takaomag

When so many unique vertexes are inserted concurrently (by threads or processes), it seems the elasticsearch mixed-index is incomplete.

Currently my workaround is to use only one thread for one vertex label.

Dec 10 '17 15:12 takaomag

in-memory backend also not visible in different threads , is there any configuation on it .my step is quite easy

at very begin create 5 index.for id label code [id,label] [code, label]
add data in single thread and everything goes well
new data coming delete the exists data with drop all V and E and commit tx
add new data as step 2
do reindex with management

then certain case ..search .got unexpect result 1 e.g has(id, 'xxx') -> V[1] has(code, "xxx")-> V[2] which suppose the same 2 search with id is ok but when search with id, label won't get result the index type is composite...

May 23 '23 14:05 mumutu66