snakebite Request to improve HAClient failover

The HAClient could be improved slightly to handle the following real-world situation:

Time 00:00 - create HAClient & namenode-2 is Active Time 03:00 - namenode-2 fails, namenode-1 takes over as Active At this time, all future requests fail with RequestError: org.apache.hadoop.ipc.RpcServerException.

We could improve HAClient to start at the top of the list of namenodes at Time 03:00.

Oct 29 '15 18:10 toryhaavik

I'm not completely sure what goes wrong here. Snakebite should cycle through namenodes. Unfortunately the order is set by configuration right now, so if the first namenode becomes unresponsive, snakebite might timeout before it uses the second namenode. I think this could be solved by letting snakebite talk to zookeeper and figure out the active namenode from there.

Jan 25 '16 23:01 wouterdebie

The problem for us is that snakebite does not understand nameservices, so it does not map HA hdfs urls to the correct namenodes. For example, if you have a hdfs-site.xml configured with nameservice IDs, it will look something like this:

 <property>
   <name>dfs.nameservices</name>
     <value>opal-ha</value>
 </property>
 <property>
   <name>dfs.ha.namenodes.opal-ha</name>
     <value>nn,nn-2</value>
 </property>
 <property>
   <name>dfs.namenode.rpc-address.opal-ha.nn</name>
     <value>nn.some.host.name:8020</value>
 </property>
 <property>
   <name>dfs.namenode.rpc-address.opal-ha.nn-2</name>
     <value>nn-2.some.host.name:8020</value>
 </property>

Ideally, the HAClient would know the nameservice to namenode mapping, so that urls like hdfs://opal-ha/ work in snakebite the way they do with official hadoop client tools.

Jan 27 '16 20:01 dacjames