Observed "Failed to close connection for host "errors when one of the server is down in a cluster
hi I build dynomite cluster and if one of the node is get down I observed below mentioned errors in my log file . please give solution
ERROR Consumer_metric_data-cassandra_metric_worker_0 12/Apr/2019 10:25:27 com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=RedisMaster, ipAddress=192.168.56.221, port=8102, rack: dc1a, datacenter: dc1, status: Up, hashtag=null, password=null] java.net.SocketException: Broken pipe (Write failed) ERROR Consumer_metric_data-cassandra_metric_worker_0 12/Apr/2019 10:25:27 com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=RedisMaster, ipAddress=192.168.56.221, port=8102, rack: dc1a, datacenter: dc1, status: Up, hashtag=null, password=null] java.net.SocketException: Broken pipe (Write failed) ERROR Consumer_dist_jobs-dist_jobs_worker_0 12/Apr/2019 10:30:00 com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=RedisMaster, ipAddress=192.168.56.221, port=8102, rack: dc1a, datacenter: dc1, status: Up, hashtag=null, password=null] java.net.SocketException: Broken pipe (Write failed) ERROR Consumer_dist_jobs-dist_jobs_worker-23-thread-1 12/Apr/2019 10:30:00 com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=RedisMaster, ipAddress=192.168.56.221, port=8102, rack: dc1a, datacenter: dc1, status: Up, hashtag=null, password=null] Unexpected end of stream. ERROR Consumer_dist_jobs-dist_jobs_worker_0 12/Apr/2019 10:30:00 com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=RedisMaster, ipAddress=192.168.56.221, port=8102, rack: dc1a, datacenter: dc1, status: Up, hashtag=null, password=null] java.net.SocketException: Broken pipe (Write failed) ERROR Consumer_metric_data-cassandra_metric_worker_0 12/Apr/2019 10:30:28 com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=RedisMaster, ipAddress=192.168.56.221, port=8102, rack: dc1a, datacenter: dc1, status: Up, hashtag=null, password=null] java.net.SocketException: Broken pipe (Write failed) ERROR Consumer_metric_data-cassandra_metric_worker_0 12/Apr/2019 10:30:28 com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=RedisMaster, ipAddress=192.168.56.221, port=8102, rack: dc1a, datacenter: dc1, status: Up, hashtag=null, password=null] java.net.SocketException: Broken pipe (Write failed) ERROR Consumer_metric_data-cassandra_metric_worker_0 12/Apr/2019 10:35:29 com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=RedisMaster, ipAddress=192.168.56.221, port=8102, rack: dc1a, datacenter: dc1, status: Up, hashtag=null, password=null] Unexpected end of stream. ERROR Consumer_metric_data-cassandra_metric_worker_0 12/Apr/2019 10:35:29 com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=RedisMaster, ipAddress=192.168.56.221, port=8102, rack: dc1a, datacenter: dc1, status: Up, hashtag=null, password=null] java.net.SocketException: Broken pipe (Write failed) ERROR Consumer_metric_data-cassandra_metric_worker_0 12/Apr/2019 10:40:30 com.netflix.dyno.connectionpool.impl.health.ConnectionPoolHealthTracker - FAIL: Attempting to reconnect pool due to exceptions =>FatalConnectionException: [host=Host [hostname=RedisMaster, ipAddress=192.168.56.221, port=8102, rack: dc1a, datacenter: dc1, status: Up, hashtag=null, password=null], latency=0(0), attempts=1]redis.clients.jedis.exceptions.JedisConnectionException: Unexpected end of stream. ERROR Consumer_metric_data-cassandra_metric_worker_0 12/Apr/2019 10:40:30 com.netflix.dyno.connectionpool.impl.health.ConnectionPoolHealthTracker - Enqueueing host cp for recycling due to too many errors: HostConnectionPool: [Host: Host [hostname=RedisMaster, ipAddress=192.168.56.221, port=8102, rack: dc1a, datacenter: dc1, status: Up, hashtag=null, password=null], Pool active: true]
More information in #166
To prevent things from blowing up (not sure if this is the best way), I added the allowFallback param to the retrypolicyfactory on my connectionpoolconfig. If there is another host/node in the pool, it will fallback to a different one as opposed to exploding.
ConnectionPoolConfigurationImpl cp = new ConnectionPoolConfigurationImpl("bbs") .setFailOnStartupIfNoHosts(true) .setRetryPolicyFactory(new RetryNTimes.RetryFactory(2, true)) .setLocalRack(localRack) .withTokenSupplier(tokenMapSupplier);
Specifically refer to this comment https://github.com/Netflix/dyno/issues/166#issuecomment-303919345 and the following one about the LOCAL_RACK env variable
PS. Please provide some code of your implementation in the future. It will help others understand your issue better and give them an idea of what might specifically be wrong with your implementation
@chrisbendel I also using same properties new DynoJedisClient.Builder() .withApplicationName(clientName) .withDynomiteClusterName(clusterName) .withCPConfig(new ConnectionPoolConfiguration(clientName) .withTokenSupplier(TokenMapSupplierHelper.toTokenMapSupplier(nodes)) .setMaxConnsPerHost(maxConnections) .setConnectTimeout(maxTimeOut) .setRetryPolicyFactory(new RetryNTimes.RetryFactory(retryCount,true)) .setMaxTimeoutWhenExhausted(maxTimeOutExhausted) .setLocalRack(localRack) .setLocalDataCenter(localDc) .setMaxFailoverCount(maxFailOverCount) ) .withHostSupplier(TokenMapSupplierHelper.toHostSupplier(nodes)) .build();
@chrisbendel but I am getting issue. when one of the node in cluster is stopped
I observed a similar issue when setting the localDataCenter on the connection pool.
Can you please provide the code you have for your hostsupplier and tokenmapsupplier?
In the future, you can also use triple backticks around your code to format it properly.
@chrisbendel sorry for the late response
public static TokenMapSupplier toTokenMapSupplier(List<DynomiteNodeInfo> nodes){
StringBuilder jsonSB = new StringBuilder("[");
int count = 0;
for(DynomiteNodeInfo node: nodes){
jsonSB.append(" {"token":""+ node.getTokens()
+ "","hostname":"" + node.getHostname()
+ "","ip":"" + node.getIpaddress()
+ "","zone":"" + node.getRack()
+ "","rack":"" + node.getRack()
+ "","dc":"" + node.getDc()
+ ""} ");
count++;
if (count < nodes.size())
jsonSB.append(" , ");
}
jsonSB.append(" ]"");
final String json = jsonSB.toString();
TokenMapSupplier testTokenMapSupplier = new AbstractTokenMapSupplier(8102) {
@Override
public String getTopologyJsonPayload(String hostname) {
return json;
}
@Override
public String getTopologyJsonPayload(java.util.Set<Host> activeHosts) {
return json;
}
};
return testTokenMapSupplier;
}
public static HostSupplier toHostSupplier(List<DynomiteNodeInfo> nodes){
final Collection<Host> hosts = new ArrayList<Host>();
for(DynomiteNodeInfo node: nodes){
hosts.add(buildHost(node));
}
final HostSupplier customHostSupplier = new HostSupplier() {
@Override
public List<Host> getHosts() {
return (List<Host>) hosts;
}
};
return customHostSupplier;
}