incubator-hugegraph icon indicating copy to clipboard operation
incubator-hugegraph copied to clipboard

[Bug] Hugegraph isn't responding after Cassandra restarted.

Open mkj-git opened this issue 1 year ago • 4 comments

Bug Type (问题类型)

server status (启动/运行异常)

Before submit

  • [x] 我已经确认现有的 IssuesFAQ 中没有相同 / 重复问题 (I have confirmed and searched that there are no similar problems in the historical issue and documents)

Environment (环境信息)

  • Server Version: Latest Code Build
  • Backend: Cassandra
  • OS: Ubuntu
  • Data Size: Just few data

Expected & Actual behavior (期望与实际表现)

I did following steps:

  1. Started cassandra (V 5.0.3)
  2. Started Hugegraph
./start-hugegraph.sh 
Starting HugeGraphServer in daemon mode...
Connecting to HugeGraphServer (http://127.0.0.1:8080/graphs)....OK
Started [pid 1252693]

  1. Created one vertex from Swagger UI and its got created.
  2. Stopped cassandra.
  3. Again tried from Swagger UI and it's returning error:
Failed to fetch.
Possible Reasons:

CORS
Network Failure
URL scheme must be "http" or "https" for CORS request.
  1. Restarted cassandra and tried from Swagger but same issue
incubator-hugegraph-master/target/apache-hugegraph-incubating-1.5.0/apache-hugegraph-server-incubating-1.5.0/bin$ ./start-hugegraph.sh 
Starting HugeGraphServer in daemon mode...
Connecting to HugeGraphServer (http://127.0.0.1:8080/graphs)....OK
Started [pid 1254569]
incubator-hugegraph-master/target/apache-hugegraph-incubating-1.5.0/apache-hugegraph-server-incubating-1.5.0/bin$ ./stop-hugegraph.sh 
no crontab for manish
The HugeGraphServer monitor has been closed
Dev/incubator-hugegraph-master/target/apache-hugegraph-incubating-1.5.0/apache-hugegraph-server-incubating-1.5.0/bin/util.sh: line 375: kill: (1254569) - No such process
Killing HugeGraphServer(pid 1254569).OK

IN the log i don't see anything, here is the last few line

2025-02-28 14:43:04 [main] [INFO] o.a.t.g.s.AbstractChannelizer - Configured application/vnd.gremlin-v2.0+json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV2d0
2025-02-28 14:43:04 [main] [INFO] o.a.t.g.s.AbstractChannelizer - application/json already has org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0 configured - it will not be replaced by org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV2d0, change order of serialization configuration if this is not desired.
2025-02-28 14:43:04 [main] [INFO] o.a.t.g.s.AbstractChannelizer - Configured application/vnd.gremlin-v3.0+json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0
2025-02-28 14:43:04 [main] [INFO] o.a.t.g.s.AbstractChannelizer - application/json already has org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0 configured - it will not be replaced by org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, change order of serialization configuration if this is not desired.
2025-02-28 14:43:04 [main] [INFO] o.a.t.g.s.AbstractChannelizer - Configured application/vnd.graphbinary-v1.0 with org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1
2025-02-28 14:43:04 [main] [INFO] o.a.t.g.s.AbstractChannelizer - Configured application/json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0
2025-02-28 14:43:04 [main] [INFO] o.a.t.g.s.AbstractChannelizer - Configured application/vnd.gremlin-v2.0+json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV2d0
2025-02-28 14:43:04 [main] [INFO] o.a.t.g.s.AbstractChannelizer - application/json already has org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0 configured - it will not be replaced by org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV2d0, change order of serialization configuration if this is not desired.
2025-02-28 14:43:04 [main] [INFO] o.a.t.g.s.AbstractChannelizer - Configured application/vnd.gremlin-v3.0+json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0
2025-02-28 14:43:04 [main] [INFO] o.a.t.g.s.AbstractChannelizer - application/json already has org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0 configured - it will not be replaced by org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, change order of serialization configuration if this is not desired.
2025-02-28 14:43:04 [gremlin-server-boss-1] [INFO] o.a.t.g.s.GremlinServer - Gremlin Server configured with worker thread pool of 1, gremlin pool of 8 and boss thread pool of 1.
2025-02-28 14:43:04 [gremlin-server-boss-1] [INFO] o.a.t.g.s.GremlinServer - Channel started at port 8182.
2025-02-28 14:43:04 [main] [INFO] o.a.h.d.MemoryMonitor - Memory monitoring started.

Vertex/Edge example (问题点 / 边数据举例)


Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)


mkj-git avatar Feb 28 '25 09:02 mkj-git

I found a similar closed issue related to HugeGraph not responding after a Cassandra restart, but no solution was found in the comments [1].

Regarding your issue, it seems that HugeGraph is not handling the reconnection to Cassandra after it has been restarted. This is a known challenge, as HugeGraph may not automatically reconnect to the backend once the connection is lost.

Here are a few things you might consider checking or trying:

  1. Logs for Connection Errors: Check the logs for any specific error messages related to the connection to Cassandra. Look for messages like "Failed to connect keyspace" or NoHostAvailableException, which indicate connection issues [2].

  2. Configuration for Reconnection: Ensure that your HugeGraph configuration is set up to handle reconnections. While I couldn't find specific configurations in the repository, you might want to check if there are any settings related to connection retries or timeouts.

  3. Restart HugeGraph: If HugeGraph does not automatically reconnect, you might need to restart the HugeGraph server after Cassandra is back online. This can help re-establish the connection.

  4. Graceful Handling of Restarts: HugeGraph has mechanisms to handle backend restarts gracefully, such as the PdConfigureListener, which listens for restart messages and updates configurations [3]. Ensure that your setup is using the latest configurations.

If these steps do not resolve the issue, you might need to look into more detailed logs or consider reaching out to other developers who have faced similar issues for potential workarounds.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar Feb 28 '25 09:02 dosubot[bot]

I got the issue after enabling foreground mode. Once cassandra is down the server itself terminating. This is bad, it should retry, reconnect. Is there anything like this in hugegraph?

./start-hugegraph.sh 
Starting HugeGraphServer in foreground mode...
./start-hugegraph.sh: line 113: 1271256 Killed                  "${BIN}"/hugegraph-server.sh "${CONF}/${GREMLIN_SERVER_CONF}" "${CONF}"/rest-server.properties "${OPEN_SECURITY_CHECK}" "${USER_OPTION}" "${GC_OPTION}" "${OPEN_TELEMETRY}" >> "${LOGS}"/hugegraph-server.log 2>&1
Connecting to http://127.0.0.1:8080/graphs (30).error: process ID list syntax error

Usage:
 ps [options]

 Try 'ps --help <simple|list|output|threads|misc|all>'
  or 'ps --help <s|l|o|t|m|a>'
 for additional help text.

For more details see ps(1).
Starting http://127.0.0.1:8080/graphs failed
See /home/manish/Dev/incubator-hugegraph-master/target/apache-hugegraph-incubating-1.5.0/apache-hugegraph-server-incubating-1.5.0/logs/hugegraph-server.log for HugeGraphServer log output.

mkj-git avatar Feb 28 '25 14:02 mkj-git

HugeGraph has a mechanism to automatically retry and reconnect to Cassandra when it goes down. The InitStore class in the HugeGraph server code specifies retry logic with up to 10 retries and a retry interval of 5000 milliseconds. This is designed to handle exceptions like OperationTimedOutException and NoHostAvailableException, which occur when Cassandra is temporarily unavailable [1].

If the server is terminating instead of retrying, ensure that your configuration is correctly set up to utilize this retry mechanism. You might also want to check the logs for any specific error messages that could provide more insight into why the server is terminating.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar Feb 28 '25 14:02 dosubot[bot]

This issue looks like a problem. in case if Cassandra goes down or started after hugegraph, its not working.

{
  "exception": "class org.apache.hugegraph.exception.NotFoundException",
  "message": "Graph 'hugegraph' does not exist",
  "cause": ""
}

I am getting this issue when I started cassandra after hugegraph started

mkj-git avatar Mar 01 '25 06:03 mkj-git