amuse icon indicating copy to clipboard operation
amuse copied to clipboard

AMUSE hangs when a system has multiple network interfaces

Open Sbte opened this issue 5 years ago • 8 comments

Describe the bug If a system has multiple network interfaces, like ethernet and wifi, or ethernet and docker, it hangs in seemingly random places. I am confused that it DOES work with LXC, since I also have a network interface from that.

To Reproduce Create an extra network interface and run test_c_implementation.py

Expected behavior I expect it to not hang

Logs Container network interfaces (docker0 causes problems, lxcbr0 does not):

3: lxcbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
12: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 

Workaround For me it worked after setting

export OMPI_MCA_btl_tcp_if_include="enp7s0"

which is my ethernet network. It seems like multiple people run into this problem (also Merijn) so it would be nice if AMUSE at least detects this.

This is probably also related to #128

Sbte avatar Jun 16 '20 14:06 Sbte

Thanks for reporting, it's a revelation to me that having multiple network interfaces could be the source of this problem. There may be several related issues, indeed like #128 and perhaps some of the tests that seem to hang in Travis and github testing.

rieder avatar Jun 16 '20 14:06 rieder

there are other variables that can influence this: OMPI_MCA_oob_tcp_if_include=.. and maybe the "exclude" variants

ipelupessy avatar Jun 16 '20 17:06 ipelupessy

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 28 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 04 '22 16:03 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 28 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar May 10 '22 13:05 stale[bot]

keep open

ipelupessy avatar May 10 '22 15:05 ipelupessy

@rieder @Sbte ok, so I have been thinking of reporting the network interfaces or check for potential problems..there is no default way of getting the network interface in python w/o an external library. e.g. psutil has utility functions for this, but would entail an extra prerequisite..is it important enough for this?

ipelupessy avatar Jul 01 '22 10:07 ipelupessy

Still running into this (on a different machine again), but this time I'm also unable to solve it using the workaround described above. Only redirection="none" solved anything.

Can't you just check the communication between nodes at startup and if it doesn't work throw a warning and fall back to a different redirection method or something?

Sbte avatar Nov 15 '22 17:11 Sbte