Memory leak when logstash is not available
Hello. My team recently faced a problem with python-logstash-async. When logstash server went into maintenance, the memory usage of one of our services started to rapidly grow.
Turns out this is because of python-logstash-async's feature to store failed events in the sqlite database. Even though we write everything to the filesystem, memory usage is not even comparable to database size (memory usage is much bigger).
To demonstrate that I created this little Django project: https://github.com/xunto-examples/async_logstash_memory_leak
I run start_logstash.sh and start_no_logstash.sh to start service. And tests.sh to start sending requests.
Test results when logstash is not available:

While database size was 71.7MB at the end of testing.
Test result for the same project but without python-logstash-async enabled:

I can also confirm that leak doesn't happen anymore if event_ttl parameter is set.
Thanks for the detailed report! I'm currently quite limited on time, so it might took some time to look into it.
It's ok, we are going to stick with setting event_ttl for now anyway.
I had a look at it but unfortunately, I'm not able to reproduce it, not even with your great test setup.
As you can see, I had the test setup running for hours and continously firing requests on it but the memory consumption kept at the same level.
What versions of python-logstash-async, django and python are you using?
I used the following versions:
(venv) enrico@localhost: ~% python --version
Python 3.7.3
(venv) enrico@localhost: ~% pip list
Package Version
--------------------- ---------
Django 3.2
memory-profiler 0.58.0
python-logstash-async 2.2.0
...
Hi, sorry for slow response, we used
Django==3.1.7
python-logstash-async==2.2.0
and
Python 3.9.3
I also found out that the REAL reason for RAM leakage was that https://github.com/eht16/python-logstash-async/blob/018d238e3f76b97def3b279b0226d8e326bd3dee/logstash_async/worker.py#L49 were getting big too fast. Not sure we could do anything about it.
Also turned out that event_ttl didn't really help... so we moved to using stdout and fluentd instead of your library. :(
I also found out that the REAL reason for RAM leakage was that
https://github.com/eht16/python-logstash-async/blob/018d238e3f76b97def3b279b0226d8e326bd3dee/logstash_async/worker.py#L49
were getting big too fast. Not sure we could do anything about it.
Good catch!
I was able to reproduce the behaviour by slowing down writing events to the SQLite database. Then the events impound in the Queue and so consume memory.
Is it possible that the storage you used could be too slow to keep up with the high event rate? If so, it would explain the high memory usage.
Also turned out that
event_ttldidn't really help... so we moved to usingstdoutandfluentdinstead of your library. :(
Sad to hear. Sorry for not being able to help you in time.
If you still want to try it, you could set the database_path setting to None to use an in-memory database instead of a storage based on. If the above assumption is true, this could help.
Closing due to missing response. Feel free to re-open if the problem persists.