dyno icon indicating copy to clipboard operation
dyno copied to clipboard

Observed "Failed to close connection for host "errors when one of the server is down in a cluster

Open ghost opened this issue 6 years ago • 5 comments

hi I build dynomite cluster and if one of the node is get down I observed below mentioned errors in my log file . please give solution

ERROR Consumer_metric_data-cassandra_metric_worker_0 12/Apr/2019 10:25:27 com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=RedisMaster, ipAddress=192.168.56.221, port=8102, rack: dc1a, datacenter: dc1, status: Up, hashtag=null, password=null] java.net.SocketException: Broken pipe (Write failed) ERROR Consumer_metric_data-cassandra_metric_worker_0 12/Apr/2019 10:25:27 com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=RedisMaster, ipAddress=192.168.56.221, port=8102, rack: dc1a, datacenter: dc1, status: Up, hashtag=null, password=null] java.net.SocketException: Broken pipe (Write failed) ERROR Consumer_dist_jobs-dist_jobs_worker_0 12/Apr/2019 10:30:00 com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=RedisMaster, ipAddress=192.168.56.221, port=8102, rack: dc1a, datacenter: dc1, status: Up, hashtag=null, password=null] java.net.SocketException: Broken pipe (Write failed) ERROR Consumer_dist_jobs-dist_jobs_worker-23-thread-1 12/Apr/2019 10:30:00 com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=RedisMaster, ipAddress=192.168.56.221, port=8102, rack: dc1a, datacenter: dc1, status: Up, hashtag=null, password=null] Unexpected end of stream. ERROR Consumer_dist_jobs-dist_jobs_worker_0 12/Apr/2019 10:30:00 com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=RedisMaster, ipAddress=192.168.56.221, port=8102, rack: dc1a, datacenter: dc1, status: Up, hashtag=null, password=null] java.net.SocketException: Broken pipe (Write failed) ERROR Consumer_metric_data-cassandra_metric_worker_0 12/Apr/2019 10:30:28 com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=RedisMaster, ipAddress=192.168.56.221, port=8102, rack: dc1a, datacenter: dc1, status: Up, hashtag=null, password=null] java.net.SocketException: Broken pipe (Write failed) ERROR Consumer_metric_data-cassandra_metric_worker_0 12/Apr/2019 10:30:28 com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=RedisMaster, ipAddress=192.168.56.221, port=8102, rack: dc1a, datacenter: dc1, status: Up, hashtag=null, password=null] java.net.SocketException: Broken pipe (Write failed) ERROR Consumer_metric_data-cassandra_metric_worker_0 12/Apr/2019 10:35:29 com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=RedisMaster, ipAddress=192.168.56.221, port=8102, rack: dc1a, datacenter: dc1, status: Up, hashtag=null, password=null] Unexpected end of stream. ERROR Consumer_metric_data-cassandra_metric_worker_0 12/Apr/2019 10:35:29 com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=RedisMaster, ipAddress=192.168.56.221, port=8102, rack: dc1a, datacenter: dc1, status: Up, hashtag=null, password=null] java.net.SocketException: Broken pipe (Write failed) ERROR Consumer_metric_data-cassandra_metric_worker_0 12/Apr/2019 10:40:30 com.netflix.dyno.connectionpool.impl.health.ConnectionPoolHealthTracker - FAIL: Attempting to reconnect pool due to exceptions =>FatalConnectionException: [host=Host [hostname=RedisMaster, ipAddress=192.168.56.221, port=8102, rack: dc1a, datacenter: dc1, status: Up, hashtag=null, password=null], latency=0(0), attempts=1]redis.clients.jedis.exceptions.JedisConnectionException: Unexpected end of stream. ERROR Consumer_metric_data-cassandra_metric_worker_0 12/Apr/2019 10:40:30 com.netflix.dyno.connectionpool.impl.health.ConnectionPoolHealthTracker - Enqueueing host cp for recycling due to too many errors: HostConnectionPool: [Host: Host [hostname=RedisMaster, ipAddress=192.168.56.221, port=8102, rack: dc1a, datacenter: dc1, status: Up, hashtag=null, password=null], Pool active: true]

ghost avatar Apr 12 '19 11:04 ghost

More information in #166

To prevent things from blowing up (not sure if this is the best way), I added the allowFallback param to the retrypolicyfactory on my connectionpoolconfig. If there is another host/node in the pool, it will fallback to a different one as opposed to exploding.

ConnectionPoolConfigurationImpl cp = new ConnectionPoolConfigurationImpl("bbs") .setFailOnStartupIfNoHosts(true) .setRetryPolicyFactory(new RetryNTimes.RetryFactory(2, true)) .setLocalRack(localRack) .withTokenSupplier(tokenMapSupplier);

Specifically refer to this comment https://github.com/Netflix/dyno/issues/166#issuecomment-303919345 and the following one about the LOCAL_RACK env variable

PS. Please provide some code of your implementation in the future. It will help others understand your issue better and give them an idea of what might specifically be wrong with your implementation

chrisbendel avatar Apr 12 '19 13:04 chrisbendel

@chrisbendel I also using same properties new DynoJedisClient.Builder() .withApplicationName(clientName) .withDynomiteClusterName(clusterName) .withCPConfig(new ConnectionPoolConfiguration(clientName) .withTokenSupplier(TokenMapSupplierHelper.toTokenMapSupplier(nodes)) .setMaxConnsPerHost(maxConnections) .setConnectTimeout(maxTimeOut) .setRetryPolicyFactory(new RetryNTimes.RetryFactory(retryCount,true)) .setMaxTimeoutWhenExhausted(maxTimeOutExhausted) .setLocalRack(localRack) .setLocalDataCenter(localDc) .setMaxFailoverCount(maxFailOverCount) ) .withHostSupplier(TokenMapSupplierHelper.toHostSupplier(nodes)) .build();

ghost avatar Apr 15 '19 07:04 ghost

@chrisbendel but I am getting issue. when one of the node in cluster is stopped

ghost avatar Apr 15 '19 07:04 ghost

I observed a similar issue when setting the localDataCenter on the connection pool.

Can you please provide the code you have for your hostsupplier and tokenmapsupplier?

In the future, you can also use triple backticks around your code to format it properly.

chrisbendel avatar Apr 15 '19 11:04 chrisbendel

@chrisbendel sorry for the late response

public static TokenMapSupplier toTokenMapSupplier(List<DynomiteNodeInfo> nodes){ StringBuilder jsonSB = new StringBuilder("["); int count = 0; for(DynomiteNodeInfo node: nodes){ jsonSB.append(" {"token":""+ node.getTokens() + "","hostname":"" + node.getHostname() + "","ip":"" + node.getIpaddress()
+ "","zone":"" + node.getRack() + "","rack":"" + node.getRack() + "","dc":"" + node.getDc() + ""} "); count++; if (count < nodes.size()) jsonSB.append(" , "); } jsonSB.append(" ]""); final String json = jsonSB.toString(); TokenMapSupplier testTokenMapSupplier = new AbstractTokenMapSupplier(8102) { @Override public String getTopologyJsonPayload(String hostname) { return json; } @Override public String getTopologyJsonPayload(java.util.Set<Host> activeHosts) { return json; } }; return testTokenMapSupplier; }

public static HostSupplier toHostSupplier(List<DynomiteNodeInfo> nodes){
	final Collection<Host> hosts = new ArrayList<Host>();
	
	for(DynomiteNodeInfo node: nodes){
		hosts.add(buildHost(node));
	}
	
	final HostSupplier customHostSupplier = new HostSupplier() {
	 @Override
	   public List<Host> getHosts() {
		   return (List<Host>) hosts;
	   }
	};
	return customHostSupplier;
}

ramuvistara avatar May 16 '19 05:05 ramuvistara