plugin-QueuedTracking icon indicating copy to clipboard operation
plugin-QueuedTracking copied to clipboard

Second queue is growing forever

Open marcinzp opened this issue 4 years ago • 47 comments

Hello, since few Matomo versions we have returning problem with the second queue which is growing forever (until I remove this plugin and reinstall it back, as I haven't found better method yet).

Matomo version: 4.4.1 MySQL version: 5.7.34-37-log PHP version: 7.3.29-1+0~20210701.86+debian10~1.gbp7ad6eb Backend: Redis

sudo -u www-data /usr/share/piwik/console queuedtracking:monitor Queue is enabled The command ./console queuedtracking:process has to be executed to process request sets within queue Up to 8 workers will be used Processor will start once there are at least 5 request sets in the queue 218717 (53+218338+68+60+54+60+40+44) request sets left in queue. 464.62M used memory (588.55M peak). 1 workers act...

How can I debug this? Thanks :-)

marcinzp avatar Aug 04 '21 13:08 marcinzp

@marcinzp thanks for reaching out. You could use the command ./console queuedtracking:print-queued-requests --queue-id=2 to look into this queue to see what kind of requests are put in there. Generally, Matomo will put all requests for the same User, Visitor and IP address into the same queue. So if for example many requests are for the same "user" (if userId feature is used), then it will put them all into the same queue.

To give you some hint we would need to understand a bit better how you track data. Are you using the JavaScript tracker? If you are using log importer then please note that we usually don't recommend using log importer in combination with queued tracking (since they are both doing some kind of queued tracking but in a different way). And if there's no other way around this would maybe need to try applying some settings.

tsteur avatar Aug 04 '21 22:08 tsteur

@tsteur thank you for your answer and explanations. In the night, the growing queue was "magically" processed, but I think I still need to prepare for the return of the problem, it returns every few days.

If I understand properly, queues as numbered from 0, so the second one has id == 1, right?

We're tracking data for multiple sites using JS tracker and two of our sites are our mobile application, which use SDK for tracking (we have two sites, separately for Android and iOS app).

What I also noticed, CPU was more busy, but it didn't process the queue even when heaving more "headroom" in processing power:

Screenshot 2021-08-05 at 11 11 57

marcinzp avatar Aug 05 '21 08:08 marcinzp

If I understand properly, queues as numbered from 0, so the second one has id == 1, right?

That's the case as far as I know.

If it happens every few days you might want to look at the content of these requests using the command I mention to see what is different there. In the past we've seen eg people using the userId feature I mentioned and there was a bug where the same userId was tracked for every visitor (eg null or undefined or nil). Such things could cause many requests to be put into the same queue.

tsteur avatar Aug 05 '21 22:08 tsteur

Thank you @tsteur In advance - is there a way to remove permanently first request from the specific queue, just to see if this one was blocking the processing?

By userId feature you mean this one? https://matomo.org/docs/user-id/

marcinzp avatar Aug 06 '21 09:08 marcinzp

By userId feature you mean this one?

Yes 👍

In advance - is there a way to remove permanently first request from the specific queue, just to see if this one was blocking the processing?

Not that I know but I don't think that's the issue (at least whenever I've seen this it's never been that problem)

tsteur avatar Aug 08 '21 01:08 tsteur

Thank you. I've checked, that we are using userId both in JS and SDK clients.

So if my problem returns, I need to check what is different in requests in this not/slowly processed queue, and especially look at userId if it's not the same for all requests? (what would be an incarnation of some old error)

marcinzp avatar Aug 11 '21 13:08 marcinzp

I didn't need to wait long @tsteur :-) I got WARNING of "Invalid argument"

/usr/share/piwik/console queuedtracking:monitor Queue is enabled The command ./console queuedtracking:process has to be executed to process request sets within queue Up to 8 workers will be used Processor will start once there are at least 5 request sets in the queue 22949 (44+22696+22+31+66+23+31+36) request sets left in queue. 206.05M used memory (588.55M peak). 1 workers active.

/usr/share/piwik/console queuedtracking:print-queued-requests --queue-id=1 | head Showing requests of queue 1. Use --queue-id=1 to print only information for this queue. WARNING [2021-08-11 14:03:51] 19792 /usr/share/piwik/core/Tracker/RequestSet.php(36): Warning - Invalid argument supplied for foreach() - Matomo 4.4.1 - Please report this message in the Matomo forums: https://forum.matomo.org (please do a search first as it might have been reported already) array ( 0 => Piwik\Tracker\RequestSet::__set_state(array( 'requests' => array ( ), 'tokenAuth' => NULL, 'env' => NULL,

and request contain particularly many times one user id:

/usr/share/piwik/console queuedtracking:print-queued-requests --queue-id=1 | grep "'uid'" | sort | uniq -c 4 'uid' => '3bcf33a76d83aa3525de759972a8653c', 120 'uid' => 'qn5flmc4uvua9bs4mso81ku46ejd1htr3pt6iakog7',

/usr/share/piwik/console queuedtracking:print-queued-requests --queue-id=1 | grep "'uid'" | sort | uniq -c 2 'uid' => '22b75868af85a18bac2b2ebab3c2edcf', 2 'uid' => '72fff74ca87546153a2357acaee53a19', 2 'uid' => '8sod5dt55468obo6jksmgim78ecbmltm4108i9snh7', 80 'uid' => 'qn5flmc4uvua9bs4mso81ku46ejd1htr3pt6iakog7',

marcinzp avatar Aug 11 '21 14:08 marcinzp

After two hours:

/usr/share/piwik/console queuedtracking:monitor Queue is enabled The command ./console queuedtracking:process has to be executed to process request sets within queue Up to 8 workers will be used Processor will start once there are at least 5 request sets in the queue 53144 (20+52985+28+20+37+13+17+24) request sets left in queue. 245.78M used memory (588.55M peak). 1 workers active.

/usr/share/piwik/console queuedtracking:print-queued-requests --queue-id=1 | grep "'uid'" | sort | uniq -c 4 'uid' => 'e51e2c12faa22bda35e06c4af65e215e', 120 'uid' => 'qn5flmc4uvua9bs4mso81ku46ejd1htr3pt6iakog7',

marcinzp avatar Aug 11 '21 15:08 marcinzp

@marcinzp could you try to apply this patch if any possible to see if the error goes away? https://github.com/matomo-org/matomo/pull/17877/files

tsteur avatar Aug 11 '21 22:08 tsteur

Thank you @tsteur -

EDITED

~2 hours after patching, I see that queue is indeed decreasing its size AND I don't see requests from that user id:

179829 (50+179622+13+53+26+13+47+5) request sets left in queue. 413.77M used memory (588.55M peak). 1 workers active.

So either the problem is solved or it disappeared again. Please, stay tuned :-)

MY PREVIOUS COMMENT:

unfortunately it didn't help:

/usr/share/piwik/console queuedtracking:monitor Queue is enabled The command ./console queuedtracking:process has to be executed to process request sets within queue Up to 8 workers will be used Processor will start once there are at least 5 request sets in the queue 204467 (96+203854+120+106+70+30+81+110) request sets left in queue. 445.32M used memory (588.55M peak). 1 workers active.

I didn't catch previous time, but the problem reappeared now and what I see, still requests of the same user keep dominating:

/usr/share/piwik/console queuedtracking:print-queued-requests --queue-id=1 | grep "'uid'" | sort | uniq -c 2 'uid' => '614821357dc16c4c009667b7942d98b8', 160 'uid' => 'qn5flmc4uvua9bs4mso81ku46ejd1htr3pt6iakog7',

I don't see it so often in access.log:

grep -c qn5flmc4uvua9bs4mso81ku46ejd1htr3pt6iakog7 /var/log/nginx/access.log 15

marcinzp avatar Aug 13 '21 13:08 marcinzp

Great, thanks for the update. Keep us informed 👍

tsteur avatar Aug 15 '21 20:08 tsteur

So far so good :-) Thank you for helping me, I'll keep you informed.

marcinzp avatar Aug 19 '21 18:08 marcinzp

Great, thanks for letting us know and fingers crossed. We'll merge the PR very likely soon and then it would be included in the next release.

tsteur avatar Aug 19 '21 20:08 tsteur

@tsteur The problem is back :-(

/usr/share/piwik/console queuedtracking:monitor Queue is enabled The command ./console queuedtracking:process has to be executed to process request sets within queue Up to 16 workers will be used Processor will start once there are at least 10 request sets in the queue 915616 (915550+8+1+9+3+3+8+7+2+9+2+0+7+4+0+3) request sets left in queue. 1.39G used memory (5.90G peak). 1 workers active.

sudo -u www-data /usr/share/piwik/console queuedtracking:print-queued-requests --queue-id=0 | grep "'uid'" | sort | uniq -c 2 'uid' => '02136e85b1e38a8a65d910491da6a67c', 2 'uid' => 'c188b6b6edce7e15387177a16a90d250', 2 'uid' => 'o73v5ss4in6689g79s12hv68i68r6gllppv625orm7', 280 'uid' => 'qn5flmc4uvua9bs4mso81ku46ejd1htr3pt6iakog7',

marcinzp avatar Sep 10 '21 18:09 marcinzp

@tsteur could you possible reopen this issue?

Thank you, Marcin

marcinzp avatar Sep 10 '21 18:09 marcinzp

@marcinzp can you also check if there are maybe a lot of requests with the same visitorId (this is the _id parameter) or the same IP address (not sure the IP can be filtered out easily)?

Have you checked if there is any kind of information in any error log?

tsteur avatar Sep 12 '21 21:09 tsteur

@tsteur

  1. I didn't see any error messages on "/console queuedtracking:print-queued-requests --queue-id=0"
  2. Now I've checked historical logs of nginx and stderr from "console queuedtracking:process" but there's nothing interesting.
  3. For now, the problem already disappeared (but it will come back for sure :-) )
  4. As for _id, I've been focused on uid (as above) and next time I'll check for _id

marcinzp avatar Sep 13 '21 09:09 marcinzp

So @tsteur ... I'm trying to catch the problem again:

/usr/share/piwik/console queuedtracking:monitor Queue is enabled The command ./console queuedtracking:process has to be executed to process request sets within queue Up to 16 workers will be used Processor will start once there are at least 20 request sets in the queue 643332 (599912+42854+33+48+30+27+51+42+47+34+39+56+41+42+37+39) request sets left in queue. 1.02G used memory (5.90G peak). 2 workers active.

/usr/share/piwik/console queuedtracking:print-queued-requests --queue-id=0 | grep "'uid'" | sort | uniq -c 2 'uid' => '14e927b86329e6acc09212e882b2ce67', 2 'uid' => '4eb30a904900dfec5201fc22f480324f', 2 'uid' => '6d89500cbbca2a16e51c56d0c162d98c', 6 'uid' => '7a0ab288759fbde45a121775ba17c68a', 2 'uid' => '86ab4a8705e4adb218947f62f54882e1', 2 'uid' => 'bfde41bfe7c35aaf8b49112dc7daec5d', 14 'uid' => 'nnrf2j06gvuee9s4gsuq6vu6lmf9mn1lfpvc4qtpg7', 440 'uid' => 'qn5flmc4uvua9bs4mso81ku46ejd1htr3pt6iakog7',

/usr/share/piwik/console queuedtracking:print-queued-requests --queue-id=0 | grep "'_id'" | sort | uniq -c 440 '_id' => '2691ea3d93de4912', 2 '_id' => '5a49e17a2b3e6e2f', 2 '_id' => '65e7bd0e2402bd00', 14 '_id' => '7e177923d5424324', 2 '_id' => '818da8095bf1d354', 2 '_id' => '9a5a95eff13ad9c3', 2 '_id' => 'dcf4c9766071ca8a', 6 '_id' => 'e1a3d99ec09a84bd',

I still have most request from the same user. Those requests come from our application (using Matomo SDK). There are in fact, some occurrences of this user id in access.log, but...:

grep -c "qn5flmc4uvua9bs4mso81ku46ejd1htr3pt6iakog7" /var/log/nginx/access.log 29

zgrep -c "qn5flmc4uvua9bs4mso81ku46ejd1htr3pt6iakog7" /var/log/nginx/access.log-20211001.gz 35

As you can see, it's not that often as in the queue. And:

grep "qn5flmc4uvua9bs4mso81ku46ejd1htr3pt6iakog7" /var/log/nginx/access.log | cut -f5 -d\ | tail -n5 [01/Oct/2021:15:10:32 [01/Oct/2021:15:10:33 [01/Oct/2021:15:10:33 [01/Oct/2021:15:10:34 [01/Oct/2021:15:10:34

Last occurrences today were 4 hours ago...

So I took a look at the contents of those records and I see them to be almost the same:

/usr/share/piwik/console queuedtracking:print-queued-requests | grep -C4 qn5flmc4uvua9bs4mso81ku46ejd1htr3pt6iakog7 | csplit --suppress-matched - "/^--$/" '{*}'

md5sum xx* | cut -f1 -d\ | sort | uniq -c | sort -nr 504 71b0cbd43d9a8dc8099f4021ccf5157c 334 39ebb0e1023d7383edd7f27a136f5430 200 863c4987d4d4399f541fdb3c4867e5da 92 068c4ac2443b18638a1673ea94196c80 26 6ddff64a6f5ef622b7f475ace9cf745e 4 795f399bb013928aedf1fa9c92d6299f

marcinzp avatar Oct 01 '21 17:10 marcinzp

@marcinzp thanks for the update. You could exclude specific user ID from being tracked like this see https://matomo.org/faq/how-to/how-do-i-exclude-specific-tracking-requests-in-matomo/

Edit your config/config.ini.php

[Tracker] # only add this line if it doesn't exist yet
exclude_requests = "uid==ENTER THE USER ID THAT SHOULD BE BLOCKED"

tsteur avatar Oct 03 '21 19:10 tsteur

@tsteur Thank you, I've added recommended entry:

[Tracker] ignore_visits_cookie_name = "piwik_ignore" exclude_requests = "uid==qn5flmc4uvua9bs4mso81ku46ejd1htr3pt6iakog7"

but the queue still seems to be growing: /usr/share/piwik/console queuedtracking:monitor Queue is enabled The command ./console queuedtracking:process has to be executed to process request sets within queue Up to 16 workers will be used Processor will start once there are at least 20 request sets in the queue 844619 (843880+45+53+37+79+56+57+49+56+35+42+30+54+42+43+61) request sets left in queue. 1.28G used memory (5.90G peak). 1 workers active.
845134 (844619+28+25+24+37+23+36+40+44+42+36+30+42+39+35+34) request sets left in queue. 1.28G used memory (5.90G peak). 1 workers active.
853774 (853346+43+43+30+10+32+17+22+31+36+19+26+23+32+34+30) request sets left in queue. 1.29G used memory (5.90G peak). 1 workers active.

/usr/share/piwik/console queuedtracking:print-queued-requests --queue-id=0 | grep -c qn5flmc4uvua9bs4mso81ku46ejd1htr3pt6iakog7 480 /usr/share/piwik/console queuedtracking:print-queued-requests --queue-id=0 | grep -c qn5flmc4uvua9bs4mso81ku46ejd1htr3pt6iakog7 520

marcinzp avatar Oct 04 '21 10:10 marcinzp

... but after a longer while, the queue is melting indeed :-)

/usr/share/piwik/console queuedtracking:monitor Queue is enabled The command ./console queuedtracking:process has to be executed to process request sets within queue Up to 16 workers will be used Processor will start once there are at least 20 request sets in the queue 560961 (560331+48+43+41+51+24+26+49+25+44+53+44+47+37+55+43) request sets left in queue. 921.88M used memory (5.90G peak). 1 workers active.

marcinzp avatar Oct 04 '21 12:10 marcinzp

@tsteur we are facing a similar issue after updating Matomo from 3.14 to 4.6.2. we are using matomo:4.6.2-fpm-alpine docker image for Matomo, redis:6.0-alpine for Redis and mysql:8.0.21 for the MySQL DB.

for tracking, we are using

  • JS Tracking Client - Requests coming from the JS seem to be processed fine. the requests are coming with different user ids depending on the user logged in.
  • matomo-java-tracker v1.7 - Requests from the java application are being processed slowly. we are able to see some requests of 3 days old on Matomo, the requests count of 3 days ago seems to be increasing in Matomo UI, it seems to be processing 3 days old requests. the requests from the java application have the same user and the requests seem to go to the same queue.

before the update, the requests were getting processed correctly.

Output of ./console queuedtracking:monitor

Queue is enabled
Request sets in the queue will be processed automatically after a tracking request
Up to 4 workers will be used
Processor will start once there are at least 25 request sets in the queue
374199 (3122+7067+404+363606) request sets left in queue. 699.36M used memory (699.53M peak). 4 workers active.

Output of ./console queuedtracking:print-queued-requests --queue-id=3 | grep "'uid'" | sort | uniq -c

    404           'uid' => 'Java_application_user',
     24           'uid' => 'random_user_1',
     22           'uid' => 'random_user_2',
      4           'uid' => 'random_user_3',
      6           'uid' => 'random_user_4',
      2           'uid' => 'random_user_5'

I also checked ./console queuedtracking:print-queued-requests --queue-id=3 command, even after 1 hour the output was the same.

rishabhkailey avatar Feb 10 '22 09:02 rishabhkailey

@rishabhkailey the requests from the java tracker, are they maybe mostly sent from the same IP address or maybe you are using the Matomo User ID feature and there could be a bug that many users have the same user ID?

tsteur avatar Feb 10 '22 19:02 tsteur

The requests from the java tracker are from the same IP and have the same user id (piwikRequest.setUserId(<CONSTANT_USER_ID>)).

rishabhkailey avatar Feb 11 '22 05:02 rishabhkailey

The requests from the java tracker are from the same IP and have the same user id (piwikRequest.setUserId(<CONSTANT_USER_ID>)).

This would be expected behaviour in that case. Queued tracking would put these requests all in the same queue to make sure the data for these are processed correctly. Otherwise there could be issues that data would be processed out of order and then the data could become inaccurate.

tsteur avatar Feb 14 '22 01:02 tsteur

I will check with the team. I think we are only using user id for creating segments in Matomo. we can try to use custom variables instead.

Queued tracking would put these requests all in the same queue to make sure the data for these are processed correctly

We didn't face this issue in Matomo 3.14, is the above behavior have been added after Matomo 3.14?

rishabhkailey avatar Feb 14 '22 05:02 rishabhkailey

and If we do not set the user id then will it work? does the same IP for all the requests will cause problems without user id? is there anything else we can do here?

rishabhkailey avatar Feb 14 '22 09:02 rishabhkailey

We didn't face this issue in Matomo 3.14, is the above behavior have been added after Matomo 3.14?

That should have been always there. So it's not clear why this changed now. Maybe it didn't happen exactly since the update to a newer Matomo version and was caused by something else?

and If we do not set the user id then will it work? does the same IP for all the requests will cause problems without user id? is there anything else we can do here?

The same IP for all the requests will cause the same issue. This is the case because requests of the same IP might belong to the same visitor which means we have to put these requests into the same queue. A workaround would be, if possible, to set a custom IP in the tracker (cip tracking parameter, the Java tracker might have a method for it). You could attach some kind of randomised IPs to each request. For Matomo to recognise this custom IP in the tracker the tracking request would need to be authenticated though using a token_auth parameter. see https://developer.matomo.org/api-reference/tracking-api#other-parameters-require-authentication-via-token_auth

tsteur avatar Feb 14 '22 21:02 tsteur

@tsteur Thanks for the help, setting the Custom IP for the Matomo requests, seems to solve the problem.

rishabhkailey avatar Feb 28 '22 01:02 rishabhkailey

Hi, we have the same issue.

php console queuedtracking:monitor
Queue is enabled
Request sets in the queue will be processed automatically after a tracking request
Up to 6 workers will be used
Processor will start once there are at least 50 request sets in the queue
5327211 (22+5327135+5+29+6+14) request sets left in queue. disabled used memory (disabled peak). 1 workers active.

Since days, I have tested a lot of tricks I've found.

Like :

  • https://github.com/matomo-org/plugin-QueuedTracking/issues/22
  • Changing number of workers 8 -> 16 -> 6 -> 4 -> 1
  • https://fr.matomo.org/faq/on-premise/how-to-configure-matomo-to-handle-unexpected-peak-in-traffic/

By deactivating ProcessDuringTrackingRequest there is no way to treat all off the 5M process.

I re-create empty table matomo_queuedtracking_list_trackingQueueV1_1 -> its growing too:

Processor will start once there are at least 50 request sets in the queue
2542 (47+2333+32+44+42+44) request sets left in queue. disabled used memory (disabled peak). 1 workers active.        
php -d memory_limit=-1 console queuedtracking:process 
Starting to process request sets, this can take a while

This worker finished queue processing with 0req/s (0 requests in 0.01 seconds)
php console queuedtracking:test
Settings that will be used:
Backend: mysql
NumQueueWorkers: 6
NumRequestsToProcess: 50
ProcessDuringTrackingRequest: 1
QueueEnabled: 1

Redis backend only settings (does not apply when using MySQL backend):
Host: 127.0.0.1
Port: 6379
Timeout: 0
Password: 
Database: 0
UseSentinelBackend: 0
SentinelMasterName: xxxx

Version / stats:
PHP version: 8.1.17RC1
Uname: Linux 3.10.0-1160.102.1.el7.x86_64 #1 SMP Tue Oct 17 15:42:21 UTC 2023 x86_64
Backend version: 5.7.34-log
Memory: array (
  'used_memory_human' => 'disabled',
  'used_memory_peak_human' => 'disabled',
)

Performing some tests:
Connection works in general
Initial expire seems to be set correctly
setIfNotExists works fine
expireIfKeyHasValue seems to work fine
Extending expire seems to be set correctly
expireIfKeyHasValue correctly expires only when the value is correct
Expire is still set which is correct
deleteIfKeyHasValue seems to work fine
List feature seems to work fine

Done

loris-av avatar Dec 14 '23 10:12 loris-av