micropython-esp32 icon indicating copy to clipboard operation
micropython-esp32 copied to clipboard

Guru Meditation Error during socket.connect()

Open nickzoic opened this issue 8 years ago • 6 comments

If you try to socket.connect() to an unreachable TCP/IP address it eventually (~15 seconds) returns with OSError: [Errno 113] EHOSTUNREACH

However, if you Ctrl-C during this time, the exception is immediately followed by a crash:

>>> s.connect(('10.107.1.6', 9999))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 113] EHOSTUNREACH
>>> NLR jump failed, val=0x3ffda814
Guru Meditation Error of type IllegalInstruction occurred on core  0. Exception was unhandled.

This occurs with network.WLAN() or the new network.LAN() adaptor.

nickzoic avatar Oct 19 '17 05:10 nickzoic

(yes, I'll have a look at this but I'm adding it here so I don't forget)

nickzoic avatar Oct 19 '17 05:10 nickzoic

Well, that explains recent problems I've had with WLAN.

Side Note, even machine.reset() gives Guru Meditation (Ah, my Amiga days), so there's something weird with the current IDF being used.

MrSurly avatar Oct 19 '17 05:10 MrSurly

It looks like the ctrl-C that you press during connect() is being buffered. Then the EHOSTUNREACH exception is being raised, but the ctrl-C is still pending. The ctrl-C is then raised in some strange location which leads to the crash.

Apart from this being a bug (which may be difficult to track down the reason for), to fix connect() so that you can do ctrl-C to break out of it would require setting the socket to be non-blocking at the start, then do a loop polling for the connect() to complete. In that loop you can check for ctrl-C explicitly (by calling mp_handle_pending()).

Note: this stuff is already handled in esp8266 because it uses extmod/modlwip.c which wraps the lwIP stack at a lower level. And I don't think it's possible to hook into the esp32 lwIP stack at such a level, because it's probably not exposed and also there are multi-core issues to consider.

dpgeorge avatar Oct 19 '17 05:10 dpgeorge

Same behaviour in v1.9.2-279-g090b6b80 Similar in v1.9.2-225-g75ead22c (no "Guru" message, but same "NLR jump failed")

@dpgeorge yeah, I was thinking that, we do similar things elsewhere in that library to "fake" timeouts.

nickzoic avatar Oct 19 '17 05:10 nickzoic

Apart from this being a bug (which may be difficult to track down the reason for), to fix connect() so that you can do ctrl-C to break out of it would require setting the socket to be non-blocking at the start, then do a loop polling for the connect() to complete. In that loop you can check for ctrl-C explicitly (by calling mp_handle_pending()).

All of the other socket stuff implements blocking/timeout with a loop, for this same reason. I think connect doesn't do this, because it doesn't seem LWIP allows you to set the connect timeout.

In <IDF>/components/lwip/include/lwip/lwip/sockets.h

#define SO_CONTIMEO    0x1009 /* Unimplemented: connect timeout */

... and it's not implemented in the API, either. =(

MrSurly avatar Oct 19 '17 06:10 MrSurly

Looks like we should be able to do something along these lines: https://github.com/dreamcat4/lwip/blob/master/contrib/apps/socket_examples/socket_examples.c

  • Do a non-blocking connect
  • Get an EINPROGRESS
  • Loop, doing a select waiting for the socket to be writeable
  • If we get a Ctrl-C, process it.

I'll try and get something in for this ASAP.

nickzoic avatar Oct 19 '17 06:10 nickzoic