Flowise icon indicating copy to clipboard operation
Flowise copied to clipboard

[BUG] BullMQ removeOnComplete/removeOnFail count setting not working properly for prediction queue

Open Nek-11 opened this issue 9 months ago • 2 comments

BullMQ removeOnComplete/removeOnFail count setting not working properly for prediction queue

Describe the bug

When running Flowise in queue mode with Redis, the REMOVE_ON_COUNT environment variable is not correctly limiting the number of completed jobs in the prediction queue. Despite setting REMOVE_ON_COUNT=300, the number of completed jobs (ZCARD bull:flowise-queue-prediction:completed) continues to grow well beyond 300, eventually causing excessive Redis memory usage that requires manual intervention.

To Reproduce

  1. Configure Flowise with queue mode enabled
  2. Set up Redis as the message broker
  3. Set REMOVE_ON_COUNT=300 in the environment
  4. Run a high volume of prediction jobs through a chatflow
  5. Check Redis with ZCARD bull:flowise-queue-prediction:completed
  6. Observe that the completed job count grows significantly beyond 300

Expected behavior

The bull:flowise-queue-prediction:completed sorted set should be automatically trimmed to maintain approximately 300 entries as specified by the REMOVE_ON_COUNT environment variable. Job data associated with removed entries should also be cleaned up.

Flow

Any standard chatflow that generates prediction jobs will demonstrate the issue when run at volume.

Setup

  • Installation: Docker containers (both app and workers)
  • Flowise Version: 2.2.7-patch.1
  • OS: Linux (Azure App Service)
  • Redis: Azure Cache for Redis (v6.0.14)

Additional context

  1. Confirmed the REMOVE_ON_COUNT environment variable is correctly set to 300 inside the container.
  2. The issue specifically affects the flowise-queue-prediction queue.
  3. Code examination shows that in BaseQueue.ts, the addJob method should correctly read the REMOVE_ON_COUNT environment variable and apply it to both removeOnComplete and removeOnFail options as { count: 300 }.
  4. The problem persists even after restarting all Flowise containers.
  5. Currently, the only workaround is to periodically run FLUSHDB on the Redis instance, which is not sustainable in production.
  6. Memory usage in Redis grows continuously without intervention.

Question: Could this be related to the BullMQ version's compatibility with Redis 6.0.14, since BullMQ recommends 6.2.0+? Or is there a bug in how the options are processed specifically for the prediction queue?

Nek-11 avatar Apr 24 '25 16:04 Nek-11

UPDATE:

I just saw that I was looking at the newest, code, AND NOT THE ONE of the flowise version I was using! We can see that removeOnComplete was not part of the last release, but will be part of this one. This is also true for the environment variables REMOVE_ON_COUNT and REMOVE_ON_AGE.

Here's the git diff: Image

SO; this bug seems to have been already adressed for the next release, which is great news ! @HenryHengZJ, would you have an ETA for the next release?

Nek-11 avatar Apr 25 '25 08:04 Nek-11

yep, ETA next week

HenryHengZJ avatar Apr 27 '25 07:04 HenryHengZJ

yep, ETA next week

@HenryHengZJ amazing, thanks! Regarding issue #2186 is there a plan to integrate it as well? I think it would make sense to ship both fixes together as they both relate to queue mode

Nek-11 avatar Apr 28 '25 10:04 Nek-11

yep, ETA next week

@HenryHengZJ amazing, thanks! Regarding issue #2186 is there a plan to integrate it as well? I think it would make sense to ship both fixes together as they both relate to queue mode

that requires some refractoring on Redis, will continue on that thread, closing this for now

HenryHengZJ avatar May 04 '25 13:05 HenryHengZJ