scheduling icon indicating copy to clipboard operation
scheduling copied to clipboard

One worker on a Node being removed while others are working properly

Open bamedro opened this issue 8 years ago • 0 comments

Deploying 19,840 nodes over 640 hosts, 32 nodes per host, 1 single node failed when registering with the error below. In particular, other nodes from the same host was working properly.

On Server side:

[2018-01-15 19:41:26,235 thread-290 WARN                o.o.p.r.c.RMCore] Cannot set node as available, the node is unknown: pnp://172.16.2.36:49558/ns2033-8ea6dfd9-59c8-4eaf-a4ad
-ec15634e4688_12
[2018-01-15 19:41:26,620 38/RM_NODE WARN    o.o.p.r.n.RMNodeConfigurator] Cannot properly configure the node pnp://172.16.2.36:49558/ns2033-8ea6dfd9-59c8-4eaf-a4ad-ec15634e4688_12
 because of an error during configuration phase
java.io.IOException: remote object pnp://172.16.2.36:49558/ns2033-8ea6dfd9-59c8-4eaf-a4ad-ec15634e4688_12 not found. Message method=getLocalNodeProperty, sender=null, sequenceNumb
er=0 cannot be processed
        at org.objectweb.proactive.extensions.pnp.PNPROMessageRequest.processMessage(PNPROMessageRequest.java:88)
        at org.objectweb.proactive.extensions.pnp.PNPServerHandler$RequestExecutor.run(PNPServerHandler.java:296)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)
[2018-01-15 19:41:26,709 38/RM_NODE INFO                o.o.p.r.c.RMCore] The node pnp://172.16.2.36:49558/ns2033-8ea6dfd9-59c8-4eaf-a4ad-ec15634e4688_12 provided by "admin" (pnp:
//172.16.2.36:49558/HalfbodiesNode_748472782/HalfBody_pa.stub.org.ow2.proactive.resourcemanager.nodesource.dataspace._StubDataSpaceNodeConfigurationAgent#configureNode_52009) is down

On Node Side:

Jan 15 19:41:19 debian java[1411]: Adding node ns2033-8ea6dfd9-59c8-4eaf-a4ad-ec15634e4688_12 to Resource Manager.
Jan 15 19:41:19 debian java[1411]: Adding node ns2033-8ea6dfd9-59c8-4eaf-a4ad-ec15634e4688_12 to Resource Manager.
Jan 15 19:41:26 debian java[1411]: Node pnp://172.16.2.36:49558/ns2033-8ea6dfd9-59c8-4eaf-a4ad-ec15634e4688_12 added.
Jan 15 19:41:26 debian java[1411]: Node pnp://172.16.2.36:49558/ns2033-8ea6dfd9-59c8-4eaf-a4ad-ec15634e4688_12 added.
Jan 15 19:41:26 debian java[1411]: Connected to the resource manager at pnp://172.16.2.115:64738/
Jan 15 19:41:26 debian java[1411]: Connected to the resource manager at pnp://172.16.2.115:64738/
Jan 15 19:41:26 debian java[1411]: Node pnp://172.16.2.36:49558/ns2033-8ea6dfd9-59c8-4eaf-a4ad-ec15634e4688_12 has been removed
Jan 15 19:41:26 debian java[1411]: Node pnp://172.16.2.36:49558/ns2033-8ea6dfd9-59c8-4eaf-a4ad-ec15634e4688_12 has been removed

Note that: Node pnp://172.16.2.36:49558/ns2033-8ea6dfd9-59c8-4eaf-a4ad-ec15634e4688_12 has been removed appears after Connected to the resource manager at pnp://172.16.2.115:64738/

bamedro avatar Jan 16 '18 02:01 bamedro