nchan icon indicating copy to clipboard operation
nchan copied to clipboard

multimsg timeout!!

Open seddonm1 opened this issue 8 years ago • 12 comments

Hi Leo Can you please describe what circumstances produce the multimsg timeout!! messages? I have configured publisher like this and am sure the volume produced is less than 10000/2hrs:

      nchan_publisher;
      nchan_message_buffer_length 10000;
      nchan_message_timeout 2h;
      nchan_redis_idle_channel_cache_timeout 0;

but getting these messages:

2017/03/02 05:00:21 [error] 7#7: MEMSTORE:00: multimsg 00007FCE81ACB140 timeout!!

thanks

seddonm1 avatar Mar 02 '17 20:03 seddonm1

This was a bug in an older version of Nchan that I had been monitoring with that error message. It's unrelated to any of the above settings. It has been fixed in version 1.1.0.

1.1.0 (Jan. 4 2017)
 feature: websocket subscribers now receive "application/octet-stream" messages
      in binary frame rather than text
 fix: publisher request variables not always passed to nchan_publisher_upstream_request
 feature: Redis storage 'backup' mode strictly for data persistence
 fix: possible lingering subscriber if connection is closed just before subscribing
-->  fix: possible memory leak when using multiplexed channels  <--- this is the fix
 security fix: subscribing with If-Modified-Since and without If-None-Match headers
      crashed the Nginx worker (thanks @supertong)
 security fix: sending an empty message to multipart/mixed subscriber crashed Nginx worker
 fix: publisher & subscriber response codes were logged incorrectly
 fix: websocket subscriber memory leak on upstream authentication failure
 fix: possible crash after reloading when using several instances of Nchan with Redis
 feature: nchan_access_control_allow_origin can accept nginx variables
 feature: Redis key namespaces
 feature: all size configurations can now parse decimal values
 fix: off-by-one subscriber count when using multiplexed channels
feature: accounting and dynamic limits for channel groups

slact avatar Mar 02 '17 21:03 slact

Hi. I am running the latest release v1.1.1. (I can try 1.1.2)

seddonm1 avatar Mar 02 '17 21:03 seddonm1

I see, then this is something different. Can you show me your Nginx config? (edit out whatever you need to keep private)

slact avatar Mar 02 '17 22:03 slact

Hi leo. Here is a slightly simplified version. I call an openresty subscription directive which either includes the last_event_id or not depending on client side business logic. Openresty then routes the request to the correct subscription (which you can see changes which starting point based on the first_message parameter).

location ~ /internal/publish/(?<tenant>[a-z0-9-]*)/(?<collection>\w+)$ {
  internal;

  nchan_publisher;
  nchan_message_buffer_length 10000;
  nchan_message_timeout 2h;
  nchan_redis_idle_channel_cache_timeout 0;
  nchan_channel_id "$tenant/$collection";
  nchan_use_redis on;
}

location ~ /internal/subscribe/new/(?<tenant>[a-z0-9-]*)/user$ {
  internal;

  nchan_subscriber eventsource;
  nchan_channel_id "$tenant/collection0" "$tenant/collection1" "$tenant/collection2";
  nchan_use_redis on;
  nchan_subscriber_first_message 0;
  nchan_subscriber_timeout 0;
}

location ~ /internal/subscribe/resume/(?<tenant>[a-z0-9-]*)/user$ {
  internal;

  nchan_subscriber eventsource;
  nchan_channel_id "$tenant/collection0" "$tenant/collection1" "$tenant/collection2";
  nchan_use_redis on;
  nchan_subscriber_first_message "oldest";
  nchan_subscriber_last_message_id $arg_last_event_id;
  nchan_subscriber_timeout 0;
}

I am getting memory issues with this config.

seddonm1 avatar Mar 06 '17 10:03 seddonm1

Thanks for the config. I'll be looking into this. What do you mean by "memory issues"? Also, how often are you getting the "multimsg timeout!" error?

slact avatar Mar 07 '17 18:03 slact

Sorry, the memory issue being the MEMSTORE issues.

Seems to arise after about 24hrs. ill keep an eye on timings for you.

seddonm1 avatar Mar 09 '17 06:03 seddonm1

Hi @slact ,

I'm getting that error as well. I'm running nchan on a sandbox environment, only 3 person accessing it. NCHAN returns 504 for every /sub sometimes, but not always. NCHAN is not delivering the messages and in the log file i just see 2017/03/10 14:35:55 [error] 12#0: MEMSTORE:03: multimsg 00007FEC8ED31A70 timeout!!.

The link to the nchan's error log: https://gist.github.com/michaeltcoelho/4cec53164b7d2118bb2d22a6df1bf4c2

My nginx-nchan configurations files: https://gist.github.com/michaeltcoelho/327257df15413ec41508986ad76a1724

The link to the nginx's error log: https://gist.github.com/michaeltcoelho/b8541d9392ba12ce3fa956ebf1d16839

That's another error i got: 2017/03/10 16:55:23 [error] 6#0: SUB:LONGPOLL:abort! unknown error

I'm running the latest version 1.1.1

Here is my setup: https://github.com/michaeltcoelho/docker-nchan/blob/master/Dockerfile

michaeltcoelho avatar Mar 10 '17 14:03 michaeltcoelho

Hi, anything about that?

michaeltcoelho avatar Mar 11 '17 17:03 michaeltcoelho

@michaeltcoelho : The issue you're experiencing is different, you have a worker crash which triggers those "multimsg timeout!!" errors. The real problem is

nginx: worker process: /nchan-1.1.1/src/util/nchan_output.c:375: msgtag_to_strptr: Assertion `t[i] >= -2' failed.

I'm looking into both, but I've opened a separate ticket for yours: #322.

slact avatar Mar 11 '17 18:03 slact

I have the same problem, seeing:

2018/04/16 19:19:23 [error] 15#15: MEMSTORE:00: multimsg callback #3 for 0000557563B946C0 received after expiring at 1523906363 status 1 msg 0000000000000000
2018/04/16 19:19:23 [error] 15#15: MEMSTORE:00: multimsg 0000557563B946C0 timeout!!
2018/04/16 19:40:39 [error] 13#13: MEMSTORE:00: multimsg 000055BF26783D60 timeout!!
2018/04/16 19:40:39 [error] 13#13: MEMSTORE:00: multimsg callback #3 for 000055BF26783D60 received after expiring at 1523907639 status 1 msg 0000000000000000
....

from time time to time.

We don't have a lot of traffic. Here's current status just after multimsg errors:

total published messages: 3029
stored messages: 136
shared memory used: 108K
shared memory limit: 131072K
channels: 107
subscribers: 12
redis pending commands: 0
redis connected servers: 3
total interprocess alerts received: 628053
interprocess alerts in transit: 0
interprocess queued alerts: 0
total interprocess send delay: 0
total interprocess receive delay: 0
nchan version: 1.1.14

indrekj avatar Apr 16 '18 20:04 indrekj

@indrekj these have been fixed in master with d3a65572046562cd0d92af455ff2ec7eaf466cea

slact avatar Jun 28 '18 06:06 slact

I'm still having this or a similar issue. After a few hours, with max a few messages per second and 4-5 subscribers over EventSource. Is there anything I can help with to track this down? Theese MEMSTORE timout messages and then a lot of those " open socket #6 left in connection 35" type ones.

ker process: /build/nginx-KTLRnK/nginx-1.18.0/debian/modules/nchan/src/store/memory/memstore.c:2644: msg_release: Assertion `msg->refcount > 0' failed.
2021/10/07 14:36:26 [alert] 1200#1200: worker process 1201 exited on signal 6 (core dumped)
2021/10/07 14:36:49 [error] 1202#1202: MEMSTORE:01: multimsg 0000556B3179F6C0 timeout!!
2021/10/07 14:36:49 [error] 1202#1202: MEMSTORE:01: multimsg 0000556B3179D6E0 timeout!!
2021/10/07 14:36:50 [error] 1202#1202: MEMSTORE:01: multimsg 0000556B317A4EC0 timeout!!
2021/10/07 14:36:50 [error] 1202#1202: MEMSTORE:01: multimsg 0000556B317B7B90 timeout!!
2021/10/07 14:36:51 [error] 1202#1202: MEMSTORE:01: multimsg 0000556B317B6C80 timeout!!
2021/10/07 14:37:55 [error] 1202#1202: MEMSTORE:01: multimsg 0000556B317679B0 timeout!!
2021/10/07 14:49:03 [error] 1202#1202: MEMSTORE:01: multimsg 0000556B31875260 timeout!!
2021/10/07 15:05:27 [error] 1202#1202: MEMSTORE:01: multimsg 0000556B318F8830 timeout!!
2021/10/07 15:16:04 [error] 1202#1202: MEMSTORE:01: multimsg 0000556B319EB1F0 timeout!!
2021/10/07 15:18:54 [error] 1202#1202: MEMSTORE:01: multimsg 0000556B31A5A290 timeout!!
2021/10/07 15:22:53 [error] 1202#1202: MEMSTORE:01: multimsg 0000556B31A7C080 timeout!!
2021/10/07 15:40:42 [error] 1202#1202: MEMSTORE:01: tried adding WAITING chanhead 0000556B31874B10 /1qHBjWwe7 to chanhead_gc. why?
2021/10/07 15:47:17 [error] 1202#1202: MEMSTORE:01: tried adding WAITING chanhead 0000556B317BAB30 /TIJIjVicf to chanhead_gc. why?
2021/10/07 15:52:24 [alert] 1202#1202: *18705 open socket #6 left in connection 35
2021/10/07 15:52:24 [alert] 1202#1202: *18706 open socket #107 left in connection 36
2021/10/07 15:52:24 [alert] 1202#1202: *18959 open socket #222 left in connection 37

andrisi avatar Oct 07 '21 16:10 andrisi