pulsar-client-python icon indicating copy to clipboard operation
pulsar-client-python copied to clipboard

Pulsar client with custom logger may not terminate gracefully when a producer is created

Open tzvi-shir-vaknin opened this issue 8 months ago • 4 comments

When providing a custom logger to the Pulsar client and creating a producer, intermittent terminations occur with the following error:

terminate called without an active exception
Aborted (core dumped)

This doesn't happen consistently; I'd estimate it occurs in about 1 out of 20 runs. The other executions complete gracefully as expected. As the setup here is almost exactly as in https://github.com/apache/pulsar-client-python/issues/194, I've tried adding the del statements suggested there, but that hasn't resolved the issue for me.

Here is the Python script I'm using:

import logging
import pulsar


def open_and_close():
    pulsar_logger = logging.getLogger("pulsarDummyLogger")
    client = pulsar.Client("pulsar://localhost:6650", logger=pulsar_logger)
    producer = client.create_producer("A")
    producer.close()
    client.close()
    del producer
    del client


if __name__ == "__main__":
    open_and_close()

I'm running it using the following bash command (as it doesn't happen all the time, I must run it a few times to encounter it):

for i in {1..20}; do env/bin/python python_script.py || break; done

I tried it on python 3.11.3 and on 3.13.3, as well as with pulsar-client 3.5.0, 3.6.0, 3.6.1, 3.7.0.

Any insights or suggestions would be greatly appreciated

tzvi-shir-vaknin avatar May 19 '25 12:05 tzvi-shir-vaknin

I cannot reproduce it on macOS or Ubuntu 22.04 docker container (localhost is changed to host.docker.internal).

(venv) root@bdbb571df90b:~# for i in {1..100}; do python3 python_script.py || break; done
(venv) root@bdbb571df90b:~# python3 --version
Python 3.12.3
(venv) root@bdbb571df90b:~# python3 -c 'import pulsar; print(pulsar.__version__)'
3.7.0

BewareMyPower avatar May 22 '25 11:05 BewareMyPower

Oh I can reproduced it with another round of tests with the INFO log enabled:

    logging.basicConfig(level=logging.INFO)
INFO:pulsarDummyLogger:[<none> -> pulsar://host.docker.internal:6650] Create ClientConnection, timeout=10000
INFO:pulsarDummyLogger:Created connection for pulsar://host.docker.internal:6650-pulsar://host.docker.internal:6650-0
INFO:pulsarDummyLogger:[172.17.0.2:45444 -> 192.168.65.254:6650] Connected to broker
INFO:pulsarDummyLogger:[persistent://public/default/A, ] Getting connection from pool
INFO:pulsarDummyLogger:Lookup response for persistent://public/default/A, lookup-broker-url pulsar://localhost:6650, from [172.17.0.2:45444 -> 192.168.65.254:6650] 
INFO:pulsarDummyLogger:[<none> -> pulsar://host.docker.internal:6650] Create ClientConnection, timeout=10000
INFO:pulsarDummyLogger:Created connection for pulsar://localhost:6650-pulsar://host.docker.internal:6650-0
INFO:pulsarDummyLogger:[172.17.0.2:45458 -> 192.168.65.254:6650] Connected to broker through proxy. Logical broker: pulsar://localhost:6650, proxy: , physical address:pulsar://host.docker.internal:6650
INFO:pulsarDummyLogger:Creating producer for topic:persistent://public/default/A, producerName: on [172.17.0.2:45458 -> 192.168.65.254:6650] 
INFO:pulsarDummyLogger:[persistent://public/default/A, ] Created producer on broker [172.17.0.2:45458 -> 192.168.65.254:6650] 
INFO:pulsarDummyLogger:Finished connecting to broker after 3 ms
INFO:pulsarDummyLogger:[persistent://public/default/A, standalone-0-479] Closing producer for topic persistent://public/default/A
INFO:pulsarDummyLogger:[persistent://public/default/A, standalone-0-479] Closed producer 0
INFO:pulsarDummyLogger:Closing Pulsar client with 0 producers and 0 consumers
INFO:pulsarDummyLogger:[172.17.0.2:45444 -> 192.168.65.254:6650] Connection disconnected (refCnt: 1)
INFO:pulsarDummyLogger:[172.17.0.2:45458 -> 192.168.65.254:6650] Connection disconnected (refCnt: 1)
INFO:pulsarDummyLogger:[172.17.0.2:45444 -> 192.168.65.254:6650] Destroyed connection to pulsar://host.docker.internal:6650-0
INFO:pulsarDummyLogger:[172.17.0.2:45458 -> 192.168.65.254:6650] Destroyed connection to pulsar://localhost:6650-0
INFO:pulsarDummyLogger:Producer - [persistent://public/default/A, standalone-0-479] , [batching  = off]
terminate called without an active exception
1.sh: line 1:  6356 Aborted                 python3 python_script.py

Let me assign it to myself

BewareMyPower avatar May 22 '25 11:05 BewareMyPower

It seems that there is a rare chance that the shutdown thread failed to exit.

[New Thread 0xfffff4c0f180 (LWP 6877)]
INFO:pulsarDummyLogger:[172.17.0.2:39146 -> 192.168.65.254:6650] Connection disconnected (refCnt: 1)
INFO:pulsarDummyLogger:[172.17.0.2:39152 -> 192.168.65.254:6650] Connection disconnected (refCnt: 2)
INFO:pulsarDummyLogger:[172.17.0.2:39146 -> 192.168.65.254:6650] Destroyed connection to pulsar://host.docker.internal:6650-0
INFO:pulsarDummyLogger:Producer - [persistent://public/default/A, standalone-0-522] , [batching  = off]
[Thread 0xfffff600f180 (LWP 6875) exited]
INFO:pulsarDummyLogger:[172.17.0.2:39152 -> 192.168.65.254:6650] Destroyed connection to pulsar://localhost:6650-0
[Thread 0xfffff560f180 (LWP 6876) exited]
terminate called without an active exception

From the logs above from GDB, LWP 6877, which is the shutdown thread introduced in https://github.com/apache/pulsar/pull/15316, didn't exit when the exception was thrown.

        std::thread shutdownTask{[this, self, callback] {
            shutdown();
            // ...

I added a log after client.close() and enabled debug level logs:

    print('client is closed')
    del producer
    del client
DEBUG:pulsarDummyLogger:ioExecutorProvider_ is closed
DEBUG:pulsarDummyLogger:listenerExecutorProvider_ is closed
DEBUG:pulsarDummyLogger:partitionListenerExecutorProvider_ is closed
client is closed
DEBUG:pulsarDummyLogger:[persistent://public/default/A, standalone-0-631] ~ProducerImpl
INFO:pulsarDummyLogger:Producer - [persistent://public/default/A, standalone-0-631] , [batching  = off]
terminate called without an active exception

The most possible reason is that the shutdown thread didn't exit when the Python interpreter exited. It might need a fix from the C++ client side.


While I might need more time to figure out a fix and verify if it works, I thought of a more tricky workaround and found it worked.

    client.close()
    time.sleep(0.01) # The trick is here that wait for some time until the shutdown thread is finished
    del producer # This line is actually not necessary
    del client

Could you verify it?

BewareMyPower avatar May 22 '25 12:05 BewareMyPower

Thanks for looking into this and suggesting a temporary solution! Your time and effort are greatly appreciated.

Yes, adding the timeout does resolve the immediate issue. Thank you!

For my specific use case, I'd prefer to stick with the default Pulsar logger at the ERROR level. This would keep logs minimal, appearing only in rare error scenarios, rather than implementing a timeout each time I close the client.

I hope you'll be able to address the root cause of this issue soon.

Thanks again!

tzvi-shir-vaknin avatar May 22 '25 12:05 tzvi-shir-vaknin