protocol "grpc+verbs", the PS node meeted a coredump.
System information
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Centos 7.6 TensorFlow installed from (source or binary): source TensorFlow version: 1.15.0 Python version: 2.7 Installed using virtualenv? pip? conda?: No CUDA/cuDNN version: No GPU model and memory: No Worker job number: 50 Ps job number: 10 chief job number: 1 Describe the problem
Sadly, I have run an experiment using TensorFlow with Verbs for communication on multiple Workers, which means I use the protocol "grpc+verbs". The framework is 50 worker'nodes,10 ps' nodes, and 1 chief' node. When at the end of the training, all 50 workes stopped normally. But only one of 10 ps nodes met the core-dump problem. Other ps' nodes and chief's node stoped normally.
when using the gdb to print the bt of cored-ump file. The print of the ps' node is as follows.
