synapse icon indicating copy to clipboard operation
synapse copied to clipboard

Federated rooms from other homeservers regularly stop syncing, probably caused by enabled retention

Open FrankNagel opened this issue 1 year ago • 0 comments

Description

We have a federated homeserver with enabled retention policy. Frequently federated rooms from other home servers stop syncing. In the log file we then observe KeyErrors like shown below in the section Relevant log output

The problem can be temporarily resolved by deleting the event_ids causing KeyErrors from the event_forward_extremities table:

delete from event_forward_extremities where event_id = '...';

Steps to reproduce

Assuming my guess about the root cause is correct:

  • Setup a homeserver with retention enabled
  • users join a room from another matrix server without retention times
  • wait until the max lifetime of an event in event_forward_extremities is reached and the retention policy is applied
  • the room stops syncing

Homeserver

matrix-homeserver.uni-marburg.de

Synapse Version

1.113.0

Installation Method

Debian packages from packages.matrix.org

Database

PostgreSQL 13.16 (Debian 13.16-0+deb11u1); single server; not ported from sqlite, not restored from backup

Workers

Multiple workers

Platform

Debian 11, VM

Configuration

  • federation enabled
  • retention enabled
retention:
  enabled: true
  default_policy:
    min_lifetime: 1d
    max_lifetime: 240d
  allowed_lifetime_min: 1d
  allowed_lifetime_max: 3654d

Relevant log output

2024-08-20 01:03:32,798 - synapse.http.server - 147 - ERROR - POST-2775242 - Failed handle request via 'ReplicationFederationSendEventsRestServlet': <SynapseRequest at 0x7fa10cfed070 method='POST' uri='/_synapse/replication/fed_send_events/jOVEPEWaEE' clientproto='HTTP/1.1' site='9008'>
Traceback (most recent call last):
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/internet/defer.py", line 2010, in _inlineCallbacks
    result = context.run(
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/python/failure.py", line 545, in throwExceptionIntoGenerator
    return g.throw(self.value.with_traceback(self.tb))
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/util/caches/response_cache.py", line 265, in cb
    return await callback(*args, **kwargs)
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/replication/http/federation.py", line 153, in _handle_request
    max_stream_id = await self.federation_event_handler.persist_events_and_notify(
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/handlers/federation_event.py", line 2271, in persist_events_and_notify
    ) = await self._storage_controllers.persistence.persist_events(
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/logging/opentracing.py", line 921, in _wrapper
    return await func(*args, **kwargs)
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/controllers/persist_events.py", line 427, in persist_events
    ret_vals = await yieldable_gather_results(enqueue, partitioned.items())
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/util/async_helpers.py", line 305, in yieldable_gather_results
    raise dfe.subFailure.value from None
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/internet/defer.py", line 2010, in _inlineCallbacks
    result = context.run(
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/python/failure.py", line 545, in throwExceptionIntoGenerator
    return g.throw(self.value.with_traceback(self.tb))
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/controllers/persist_events.py", line 422, in enqueue
    return await self._event_persist_queue.add_to_queue(
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/controllers/persist_events.py", line 245, in add_to_queue
    res = await make_deferred_yieldable(end_item.deferred.observe())
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/controllers/persist_events.py", line 288, in handle_queue_loop
    ret = await self._per_item_callback(room_id, item.task)
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/controllers/persist_events.py", line 368, in _process_event_persist_queue_task
    return await self._persist_event_batch(room_id, task)
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/controllers/persist_events.py", line 616, in _persist_event_batch
    ) = await self._calculate_new_forward_extremities_and_state_delta(
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/controllers/persist_events.py", line 708, in _calculate_new_forward_extremities_and_state_delta
    res = await self._get_new_state_after_events(
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/controllers/persist_events.py", line 894, in _get_new_state_after_events
    old_state_groups = {
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/controllers/persist_events.py", line 895, in <setcomp>
    event_id_to_state_group[evid] for evid in old_latest_event_ids
KeyError: '$i01NHjjt69O3x6hw409W3zVOn9xUNlXlINiCmH1q5XA'

Anything else that would be useful to know?

The event_id in the traceback above belongs to #python:matrix.org

select * from event_forward_extremities where event_id = '$i01NHjjt69O3x6hw409W3zVOn9xUNlXlINiCmH1q5XA'; event_id | room_id ----------------------------------------------+-------------------------------- $i01NHjjt69O3x6hw409W3zVOn9xUNlXlINiCmH1q5XA | !iuyQXswfjgxQMZGrfQ:matrix.org (1 row)

FrankNagel avatar Aug 20 '24 08:08 FrankNagel