websockets icon indicating copy to clipboard operation
websockets copied to clipboard

About Log Message Formatting on Reconnect

Open jbdyn opened this issue 1 year ago • 2 comments

Hey Aymeric,

I noticed that by this line

https://github.com/python-websockets/websockets/blob/98f236f89529d317628fe8ee3d4d0564e675f68d/src/websockets/asyncio/client.py#L525

the full traceback of the current Exception is included in the log message, which looks confusing to me.

Example
[2024-09-13 11:18:03] [INFO] ! connect failed; reconnecting in 0.8 seconds
Traceback (most recent call last):
  File "/▉▉▉/.venv/lib/python3.12/site-packages/websockets/asyncio/client.py", line 411, in __aiter__
    async with self as protocol:
  File "/▉▉▉/.venv/lib/python3.12/site-packages/websockets/asyncio/client.py", line 395, in __aenter__
    return await self
           ^^^^^^^^^^
  File "/▉▉▉/.venv/lib/python3.12/site-packages/websockets/asyncio/client.py", line 376, in __await_impl__
    _transport, self.connection = await self.create_connection()
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 1129, in create_connection
    raise OSError('Multiple exceptions: {}'.format(
OSError: Multiple exceptions: [Errno 111] Connect call failed ('::1', 8001, 0, 0), [Errno 111] Connect call failed ('127.0.0.1', 8001)
[2024-09-13 11:18:04] [INFO] ! connect failed; reconnecting in 3.1 seconds
Traceback (most recent call last):
  File "/▉▉▉/.venv/lib/python3.12/site-packages/websockets/asyncio/client.py", line 411, in __aiter__
    async with self as protocol:
  File "/▉▉▉/.venv/lib/python3.12/site-packages/websockets/asyncio/client.py", line 395, in __aenter__
    return await self
           ^^^^^^^^^^
  File "/▉▉▉/.venv/lib/python3.12/site-packages/websockets/asyncio/client.py", line 376, in __await_impl__
    _transport, self.connection = await self.create_connection()
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 1129, in create_connection
    raise OSError('Multiple exceptions: {}'.format(
OSError: Multiple exceptions: [Errno 111] Connect call failed ('::1', 8001, 0, 0), [Errno 111] Connect call failed ('127.0.0.1', 8001)
[2024-09-13 11:18:07] [INFO] ! connect failed; reconnecting in 5.0 seconds
Traceback (most recent call last):
  File "/▉▉▉/.venv/lib/python3.12/site-packages/websockets/asyncio/client.py", line 411, in __aiter__
    async with self as protocol:
  File "/▉▉▉/.venv/lib/python3.12/site-packages/websockets/asyncio/client.py", line 395, in __aenter__
    return await self
           ^^^^^^^^^^
  File "/▉▉▉/.venv/lib/python3.12/site-packages/websockets/asyncio/client.py", line 376, in __await_impl__
    _transport, self.connection = await self.create_connection()
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 1129, in create_connection
    raise OSError('Multiple exceptions: {}'.format(
OSError: Multiple exceptions: [Errno 111] Connect call failed ('::1', 8001, 0, 0), [Errno 111] Connect call failed ('127.0.0.1', 8001)
[2024-09-13 11:18:12] [INFO] ! connect failed; reconnecting in 8.1 seconds
Traceback (most recent call last):
  File "/▉▉▉/.venv/lib/python3.12/site-packages/websockets/asyncio/client.py", line 411, in __aiter__
    async with self as protocol:
  File "/▉▉▉/.venv/lib/python3.12/site-packages/websockets/asyncio/client.py", line 395, in __aenter__
    return await self
           ^^^^^^^^^^
  File "/▉▉▉/.venv/lib/python3.12/site-packages/websockets/asyncio/client.py", line 376, in __await_impl__
    _transport, self.connection = await self.create_connection()
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 1129, in create_connection
    raise OSError('Multiple exceptions: {}'.format(
OSError: Multiple exceptions: [Errno 111] Connect call failed ('::1', 8001, 0, 0), [Errno 111] Connect call failed ('127.0.0.1', 8001)

Since this exception is indeed useful to get to know about failed connections or received responses, how do feel about just deleting the traceback and keeping the exception message with one of these options?

Removing Traceback

Example
[2024-09-13 11:18:03] [INFO] ! connect failed; reconnecting in 0.8 seconds
OSError: Multiple exceptions: [Errno 111] Connect call failed ('::1', 8001, 0, 0), [Errno 111] Connect call failed ('127.0.0.1', 8001)
[2024-09-13 11:18:04] [INFO] ! connect failed; reconnecting in 3.1 seconds
OSError: Multiple exceptions: [Errno 111] Connect call failed ('::1', 8001, 0, 0), [Errno 111] Connect call failed ('127.0.0.1', 8001)
[2024-09-13 11:18:07] [INFO] ! connect failed; reconnecting in 5.0 seconds
OSError: Multiple exceptions: [Errno 111] Connect call failed ('::1', 8001, 0, 0), [Errno 111] Connect call failed ('127.0.0.1', 8001)
[2024-09-13 11:18:12] [INFO] ! connect failed; reconnecting in 8.1 seconds
OSError: Multiple exceptions: [Errno 111] Connect call failed ('::1', 8001, 0, 0), [Errno 111] Connect call failed ('127.0.0.1', 8001)

Put exception into logging message

self.logger.info(
    "! connect failed; reconnecting in %.1f seconds; reason: %s",
    delay,
    exc,
)
Example
[2024-09-13 11:18:03] [INFO] ! connect failed; reconnecting in 0.8 seconds; reason: Multiple exceptions: [Errno 111] Connect call failed ('::1', 8001, 0, 0), [Errno 111] Connect call failed ('127.0.0.1', 8001)
[2024-09-13 11:18:04] [INFO] ! connect failed; reconnecting in 3.1 seconds; reason: Multiple exceptions: [Errno 111] Connect call failed ('::1', 8001, 0, 0), [Errno 111] Connect call failed ('127.0.0.1', 8001)
[2024-09-13 11:18:07] [INFO] ! connect failed; reconnecting in 5.0 seconds; reason: Multiple exceptions: [Errno 111] Connect call failed ('::1', 8001, 0, 0), [Errno 111] Connect call failed ('127.0.0.1', 8001)
[2024-09-13 11:18:12] [INFO] ! connect failed; reconnecting in 8.1 seconds; reason: Multiple exceptions: [Errno 111] Connect call failed ('::1', 8001, 0, 0), [Errno 111] Connect call failed ('127.0.0.1', 8001)

Personally, I prefer the last one.

jbdyn avatar Sep 13 '24 09:09 jbdyn

My opinion is mainly "let's not get too fancy, or else someone is going to complain that it doesn't work in their APM / ..."

aaugustin avatar Sep 21 '24 16:09 aaugustin

I've never seen the pattern of removing the traceback in 18 years of building software in Python so I'm worried that it may have consequences that we would regret.

aaugustin avatar Sep 21 '24 16:09 aaugustin

Okay, so how about turning

https://github.com/python-websockets/websockets/blob/76f6f573e2ecb279230c2bf56c07bf4d4f717147/src/websockets/asyncio/client.py#L523-L527

into

self.logger.info(
    "! connect failed; reconnecting in %.1f seconds",
    delay,
)
self.logger.debug(exc, exc_info=True)  # or similar

With that, we would have a middle way: The traceback is still there if desired, but does not irritate the average user right away. As the traceback is also only useful for developers, the DEBUG level would be a good fit in my opinion.

jbdyn avatar Nov 04 '24 09:11 jbdyn

I looked into this over the weekend and I've grown sympathetic to your position.

I started standardizing on "all DEBUG and ERROR logs have tracebacks; none of the INFO and WARNING logs do".

Rationale:

  • you want everything in DEBUG
  • if it's an ERROR, it's bad enough that you want the traceback
  • if it's INFO or WARNING, it's just a timestamp and the info that something normal or abnormal (respectively) happened, but you can ignore it and you'll be OK

aaugustin avatar Nov 04 '24 12:11 aaugustin

The double log is an interesting idea. To be honest I was planning to keep it simple; if it crashes, just connect (instead of using the magic reconnect) and see the error you get.

aaugustin avatar Nov 04 '24 12:11 aaugustin

I started standardizing on "all DEBUG and ERROR logs have tracebacks; none of the INFO and WARNING logs do".

Ah, okay, so my enhancement request would be covered by this. I like :+1:

To be honest I was planning to keep it simple; if it crashes, just connect (instead of using the magic reconnect) and see the error you get.

Sorry, I am unsure what you mean by that.

jbdyn avatar Nov 04 '24 12:11 jbdyn

I meant "if I do simple INFO logging without traceback and a user wants the traceback, they can get the traceback by calling await connect(...) instead of async for ... in connect (...) — then the error will be raised.

aaugustin avatar Nov 04 '24 16:11 aaugustin

I took a quick stab at fixing it. I'm leaving it there for tonight; I'll review my own PR later :-)

Meanwhile, happy to get your feedback.

aaugustin avatar Nov 04 '24 22:11 aaugustin