pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[Bug] Delayed Message cursor lost after topic unload

Open SennoYuki opened this issue 1 year ago • 15 comments

Search before asking

  • [X] I searched in the issues and found nothing similar.

Read release policy

  • [X] I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

Version

OS: linux 3.10.0 Java: jdk17 pulsar: 3.2.0

Minimal reproduce step

  1. produce delayed message to a topic
  2. restart topic owner broker

What did you expect to see?

the message consume keeps normal

What did you see instead?

The delayed message delivery immediately

Anything else?

No response

Are you willing to submit a PR?

  • [ ] I'm willing to submit a PR!

SennoYuki avatar May 15 '24 09:05 SennoYuki

And I had already used config delayedDeliveryTrackerFactoryClassName=org.apache.pulsar.broker.delayed.BucketDelayedDeliveryTrackerFactory

SennoYuki avatar May 15 '24 09:05 SennoYuki

I need more context:

  1. What's your subscription type? Exclusive,Failover,Shared or Key_Shared?
  2. Did you enable delayedDeliveryEnabled ?

dao-jun avatar May 16 '24 15:05 dao-jun

I need more context:

  1. What's your subscription type? Exclusive,Failover,Shared or Key_Shared?
  2. Did you enable delayedDeliveryEnabled ?
  1. Shared
  2. enable

SennoYuki avatar May 17 '24 03:05 SennoYuki

And I discover my subscription cursor may not be durable. But I confirm I used durable subscription from consumer config and pulsar-admin topic stats 20240517-110607

SennoYuki avatar May 17 '24 03:05 SennoYuki

I need more context:

  1. What's your subscription type? Exclusive,Failover,Shared or Key_Shared?
  2. Did you enable delayedDeliveryEnabled ?

If possible, maybe we can check this problem with IM software like wechat for efficiency.

SennoYuki avatar May 17 '24 03:05 SennoYuki

I don't reproduce the issue, and I'll keep trying

For determine if a subscription is durable, try run ./pulsar-admin topics stats persistent://public/default/my-topic1, and see the isDurable field is true or false

dao-jun avatar May 17 '24 05:05 dao-jun

I don't reproduce the issue, and I'll keep trying

For determine if a subscription is durable, try run ./pulsar-admin topics stats persistent://public/default/my-topic1, and see the isDurable field is true or false It's true "isDurable" : true, "isReplicated" : false, "allowOutOfOrderDelivery" : false,

SennoYuki avatar May 17 '24 05:05 SennoYuki

If isDurable is true, it means your cursor is durable

dao-jun avatar May 17 '24 07:05 dao-jun

If isDurable is true, it means your cursor is durable

So I can't fix the situation. I can't restart the broker to tuning config because of duplicated consume happening after restarting

SennoYuki avatar May 17 '24 07:05 SennoYuki

@coderzc I cannot reproduce the issue on the master branch, could you please take a look?

dao-jun avatar May 17 '24 07:05 dao-jun

@dao-jun @coderzc Any update? Today I reproduce the problem with topic unload operation. Can you check the unload code?

SennoYuki avatar May 20 '24 08:05 SennoYuki

@SennoYuki Can you please provide broker log and heap dump?

dao-jun avatar May 20 '24 14:05 dao-jun

@dao-jun unload log

5月 21 02:13:04 sg-prod-server-pulsar-3 pulsar[14996]: 2024-05-21T02:13:04,493+0000 [pulsar-web-42-3] INFO  org.apache.pulsar.broker.admin.impl.PersistentTopicsBase - [null] Unloading topic persistent://public/default/push_al_queue  
5月 21 02:13:04 sg-prod-server-pulsar-3 pulsar[14996]: 2024-05-21T02:13:04,511+0000 [metadata-store-9-1] INFO  org.eclipse.jetty.server.RequestLog - 127.0.0.1 - - [21/May/2024:02:13:04 +0000] "PUT /admin/v2/persistent/public/default/push_al_queue/unload HTTP/1.1" 307 0 "-" "Pulsar-Java-v3.2.3" 21
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,538+0000 [pulsar-web-42-3] INFO  org.apache.pulsar.broker.admin.impl.PersistentTopicsBase - [null] Unloading topic persistent://public/default/push_al_queue
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,540+0000 [metadata-store-9-1] INFO  org.apache.pulsar.broker.service.Producer - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://public/default/push_al_queue}, client=/172.17.32.36:56858, producerName=server-pulsar-10-5, producerId=1}, assignedBrokerLookupData: Optional.empty
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,540+0000 [metadata-store-9-1] INFO  org.apache.pulsar.broker.service.Producer - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://public/default/push_al_queue}, client=/172.17.32.22:59572, producerName=server-pulsar-10-4, producerId=1}, assignedBrokerLookupData: Optional.empty
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,541+0000 [metadata-store-9-1] INFO  org.apache.pulsar.broker.service.Consumer - Disconnecting consumer: Consumer{subscription=PersistentSubscription{topic=persistent://public/default/push_al_queue, name=kafka_to_bk}, consumerId=1, consumerName=eyqqe, address=/172.17.32.22:53160}
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,541+0000 [metadata-store-9-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers - Removed consumer Consumer{subscription=PersistentSubscription{topic=persistent://public/default/push_al_queue, name=kafka_to_bk}, consumerId=1, consumerName=eyqqe, address=/172.17.32.22:53160} with pending 0 acks
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,541+0000 [metadata-store-9-1] INFO  org.apache.pulsar.broker.service.Consumer - Disconnecting consumer: Consumer{subscription=PersistentSubscription{topic=persistent://public/default/push_al_queue, name=kafka_to_bk}, consumerId=1, consumerName=oypam, address=/172.17.32.36:56860}
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,541+0000 [metadata-store-9-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers - Removed consumer Consumer{subscription=PersistentSubscription{topic=persistent://public/default/push_al_queue, name=kafka_to_bk}, consumerId=1, consumerName=oypam, address=/172.17.32.36:56860} with pending 0 acks
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,542+0000 [metadata-store-9-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers - [persistent://public/default/push_al_queue / kafka_to_bk] All consumers removed. Subscription is disconnected
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,542+0000 [metadata-store-9-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentSubscription - [persistent://public/default/push_al_queue][kafka_to_bk] Successfully closed subscription [ManagedCursorImpl{ledger=public/default/persistent/push_al_queue, name=kafka_to_bk, ackPos=2384:81800, readPos=3188:98332}]
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,542+0000 [metadata-store-9-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentSubscription - [persistent://public/default/push_al_queue][kafka_to_bk] Successfully closed the subscription
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,542+0000 [metadata-store-9-1] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/default/persistent/push_al_queue] Closing managed ledger
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,601+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/push_al_queue][kafka_to_bk] Updated md-position=2384:81800 into cursor-ledger 3149
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,601+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/push_al_queue] [kafka_to_bk] Closing metadata ledger 3149
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,604+0000 [main-EventThread] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/push_al_queue][kafka_to_bk] Closed cursor-ledger 3149
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,604+0000 [main-EventThread] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/push_al_queue] Topic closed
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,604+0000 [main-EventThread] INFO  org.apache.pulsar.broker.admin.impl.PersistentTopicsBase - [null] Successfully unloaded topic persistent://public/default/push_al_queue
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,604+0000 [main-EventThread] INFO  org.eclipse.jetty.server.RequestLog - 172.17.4.143 - - [21/May/2024:02:13:04 +0000] "PUT /admin/v2/persistent/public/default/push_al_queue/unload?authoritative=true HTTP/1.1" 204 0 "-" "Pulsar-Java-v3.2.3" 69
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,648+0000 [pulsar-io-3-6] INFO  org.apache.pulsar.broker.service.ServerCnx - [[id: 0x213c9851, L:/172.17.4.22:6650 - R:/172.17.32.22:53160]] Subscribing on topic persistent://public/default/push_al_queue / kafka_to_bk. consumerId: 1
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,655+0000 [pulsar-2-3] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Opening managed ledger public/default/persistent/push_al_queue
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,659+0000 [pulsar-io-3-7] INFO  org.apache.pulsar.broker.service.ServerCnx - [[id: 0xc1449f7e, L:/172.17.4.22:6650 - R:/172.17.32.36:56860]] Subscribing on topic persistent://public/default/push_al_queue / kafka_to_bk. consumerId: 1
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,660+0000 [BookKeeperClientWorker-OrderedExecutor-3-0] WARN  org.apache.bookkeeper.client.BookieWatcherImpl - New ensemble: [172.17.4.143:3181, 172.17.0.48:3181] is not adhering to Placement Policy. quarantinedBookies: []
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,666+0000 [main-EventThread] INFO  org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [172.17.4.143:3181, 172.17.0.48:3181] for ledger: 3192
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,666+0000 [BookKeeperClientWorker-OrderedExecutor-3-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/default/persistent/push_al_queue] Created ledger 3192 after closed null
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,674+0000 [bookkeeper-ml-scheduler-OrderedScheduler-3-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/default/persistent/push_al_queue] Loading cursor kafka_to_bk
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,674+0000 [bookkeeper-ml-scheduler-OrderedScheduler-3-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/push_al_queue] Recovering from bookkeeper ledger cursor: kafka_to_bk
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,679+0000 [bookkeeper-ml-scheduler-OrderedScheduler-3-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/push_al_queue] Cursor kafka_to_bk meta-data recover from ledger 3149
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,680+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/push_al_queue] Opened ledger 3149 for cursor kafka_to_bk. rc=0
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,699+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/push_al_queue] Cursor kafka_to_bk recovered to position 2384:81800
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,699+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/default/persistent/push_al_queue] Recovery for cursor kafka_to_bk completed. pos=2384:81800 -- todo=0
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,699+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl - [public/default/persistent/push_al_queue] Successfully initialize managed ledger
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,700+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/push_al_queue] Disabled replicated subscriptions controller
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,700+0000 [broker-topic-workers-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.service.BrokerService - Created topic persistent://public/default/push_al_queue - dedup is disabled
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,701+0000 [broker-topic-workers-OrderedExecutor-1-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/push_al_queue-kafka_to_bk] Rewind from 2384:81801 to 2384:81801
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,701+0000 [broker-topic-workers-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/push_al_queue] Disabled replicated subscriptions controller
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,701+0000 [broker-topic-workers-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentSubscription - backlog for persistent://public/default/push_al_queue - 41793949
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,701+0000 [broker-topic-workers-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.service.ServerCnx - [/172.17.32.36:56860] Created subscription on topic persistent://public/default/push_al_queue / kafka_to_bk
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,701+0000 [broker-topic-workers-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/push_al_queue] Disabled replicated subscriptions controller
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,701+0000 [broker-topic-workers-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentSubscription - backlog for persistent://public/default/push_al_queue - 41793949
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,701+0000 [broker-topic-workers-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.service.ServerCnx - [/172.17.32.22:53160] Created subscription on topic persistent://public/default/push_al_queue / kafka_to_bk
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,705+0000 [broker-topic-workers-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.service.ServerCnx - [/172.17.32.22:59572] Created new producer: Producer{topic=PersistentTopic{topic=persistent://public/default/push_al_queue}, client=/172.17.32.22:59572, producerName=server-pulsar-10-4, producerId=1}
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,705+0000 [broker-topic-workers-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.service.ServerCnx - [/172.17.32.36:56858] Created new producer: Producer{topic=PersistentTopic{topic=persistent://public/default/push_al_queue}, client=/172.17.32.36:56858, producerName=server-pulsar-10-5, producerId=1}
5月 21 02:13:05 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:05,231+0000 [broker-topic-workers-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Recover delayed message index bucket snapshot finish, buckets: 170, numberDelayedMessages: 7520012
5月 21 02:13:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:06,656+0000 [pulsar-io-3-7] WARN  org.apache.bookkeeper.client.BookieWatcherImpl - New ensemble: [172.17.0.48:3181, 172.17.4.143:3181] is not adhering to Placement Policy. quarantinedBookies: []
5月 21 02:13:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:06,662+0000 [main-EventThread] INFO  org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [172.17.0.48:3181, 172.17.4.143:3181] for ledger: 3193
5月 21 02:13:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:06,694+0000 [bookkeeper-ml-scheduler-OrderedScheduler-3-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/push_al_queue] Updated cursor kafka_to_bk with ledger id 3193 md-position=2384:81800 rd-position=2384:81802
5月 21 02:13:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:06,699+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/push_al_queue][kafka_to_bk] Successfully closed & deleted ledger 3149 in cursor
5月 21 02:13:07 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:07,476+0000 [pulsar-io-3-6] INFO  org.apache.pulsar.broker.service.ServerCnx - [/172.17.32.135:55076] connected with clientVersion=Pulsar Go v0.12.1, clientProtocolVersion=18, proxyVersion=null
5月 21 02:13:17 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:17,946+0000 [pulsar-io-3-1] INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /172.17.32.75:51856
5月 21 02:13:23 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:23,873+0000 [prometheus-stats-43-4] INFO  org.eclipse.jetty.server.RequestLog - 172.17.225.89 - - [21/May/2024:02:13:23 +0000] "GET /metrics/ HTTP/1.1" 200 7807 "-" "Prometheus/2.41.0" 8
5月 21 02:13:34 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:34,126+0000 [pulsar-io-3-2] INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /172.17.33.84:39704
5月 21 02:13:39 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:39,074+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_3053_3053, segmentEntryId: 58
5月 21 02:13:39 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:39,902+0000 [pulsar-io-3-5] INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /172.17.32.116:44098
5月 21 02:13:44 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:44,236+0000 [pulsar-io-3-7] INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /172.17.32.89:44170
5月 21 02:13:49 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:49,279+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_3044_3044, segmentEntryId: 58
5月 21 02:13:53 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:53,873+0000 [prometheus-stats-43-3] INFO  org.eclipse.jetty.server.RequestLog - 172.17.225.89 - - [21/May/2024:02:13:53 +0000] "GET /metrics/ HTTP/1.1" 200 7942 "-" "Prometheus/2.41.0" 9
5月 21 02:13:58 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:58,444+0000 [pulsar-load-manager-1-1] INFO  org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - Writing local data to metadata store because maximum change 3796.7018423721024% exceeded threshold 10%; time since last report written is 115.0 seconds
5月 21 02:13:58 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:58,452+0000 [metadata-store-9-1] INFO  org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl - Acquired resource lock on /loadbalance/brokers/sg-prod-server-pulsar-1.stos.org:8080
5月 21 02:14:01 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:01,921+0000 [prometheus-stats-43-4] INFO  org.eclipse.jetty.server.RequestLog - 127.0.0.1 - - [21/May/2024:02:14:01 +0000] "GET /metrics/ HTTP/1.1" 200 102780 "-" "curl/7.29.0" 4
5月 21 02:14:01 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:01,992+0000 [prometheus-stats-43-1] INFO  org.eclipse.jetty.server.RequestLog - 127.0.0.1 - - [21/May/2024:02:14:01 +0000] "GET /metrics/ HTTP/1.1" 200 102779 "-" "curl/7.29.0" 4
5月 21 02:14:03 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:03,444+0000 [pulsar-load-manager-1-1] INFO  org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - Writing local data to metadata store because maximum change 25.37202090024948% exceeded threshold 10%; time since last report written is 5.0 seconds
5月 21 02:14:03 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:03,452+0000 [metadata-store-9-1] INFO  org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl - Acquired resource lock on /loadbalance/brokers/sg-prod-server-pulsar-1.stos.org:8080
5月 21 02:14:08 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:08,496+0000 [pulsar-io-3-6] INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /172.17.32.135:55076
5月 21 02:14:08 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:08,511+0000 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_3062_3062, segmentEntryId: 32
5月 21 02:14:17 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:17,201+0000 [pulsar-io-3-1] INFO  org.apache.pulsar.broker.service.ServerCnx - [/172.17.32.75:57072] connected with clientVersion=Pulsar Go v0.12.1, clientProtocolVersion=18, proxyVersion=null
5月 21 02:14:23 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:23,873+0000 [prometheus-stats-43-2] INFO  org.eclipse.jetty.server.RequestLog - 172.17.225.89 - - [21/May/2024:02:14:23 +0000] "GET /metrics/ HTTP/1.1" 200 8165 "-" "Prometheus/2.41.0" 8
5月 21 02:14:37 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:37,064+0000 [pulsar-io-3-4] INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /172.17.32.112:54732
5月 21 02:14:45 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:45,345+0000 [BookKeeperClientWorker-OrderedExecutor-3-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_3065_3065, segmentEntryId: 34
5月 21 02:14:49 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:49,336+0000 [pulsar-io-3-4] INFO  org.apache.pulsar.broker.service.ServerCnx - [/172.17.32.122:60590] connected with clientVersion=Pulsar Go v0.12.1, clientProtocolVersion=18, proxyVersion=null
5月 21 02:14:53 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:53,871+0000 [prometheus-stats-43-1] INFO  org.eclipse.jetty.server.RequestLog - 172.17.225.89 - - [21/May/2024:02:14:53 +0000] "GET /metrics/ HTTP/1.1" 200 8160 "-" "Prometheus/2.41.0" 6
5月 21 02:14:54 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:54,044+0000 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_3078_3078, segmentEntryId: 45
5月 21 02:14:55 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:55,541+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_3084_3084, segmentEntryId: 49
5月 21 02:14:58 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:58,444+0000 [pulsar-load-manager-1-1] INFO  org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - Writing local data to metadata store because maximum change 11.665924417968807% exceeded threshold 10%; time since last report written is 55.0 seconds
5月 21 02:14:58 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:58,452+0000 [metadata-store-9-1] INFO  org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl - Acquired resource lock on /loadbalance/brokers/sg-prod-server-pulsar-1.stos.org:8080
5月 21 02:14:59 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:59,049+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2855_2855, segmentEntryId: 137
5月 21 02:14:59 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:59,207+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2891_2891, segmentEntryId: 142
5月 21 02:14:59 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:59,222+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2853_2853, segmentEntryId: 128
5月 21 02:14:59 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:59,257+0000 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2858_2858, segmentEntryId: 148
5月 21 02:14:59 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:59,273+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2888_2888, segmentEntryId: 141
5月 21 02:15:00 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:00,680+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_3165_3165, segmentEntryId: 10
5月 21 02:15:00 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:00,830+0000 [BookKeeperClientWorker-OrderedExecutor-3-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_3162_3162, segmentEntryId: 11
5月 21 02:15:01 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:01,047+0000 [pulsar-io-3-2] INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /172.17.32.62:35650
5月 21 02:15:01 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:01,052+0000 [prometheus-stats-43-2] INFO  org.eclipse.jetty.server.RequestLog - 127.0.0.1 - - [21/May/2024:02:15:01 +0000] "GET /metrics/ HTTP/1.1" 200 103142 "-" "curl/7.29.0" 11
5月 21 02:15:01 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:01,091+0000 [prometheus-stats-43-3] INFO  org.eclipse.jetty.server.RequestLog - 127.0.0.1 - - [21/May/2024:02:15:01 +0000] "GET /metrics/ HTTP/1.1" 200 102687 "-" "curl/7.29.0" 4
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,134+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2888_2888, segmentEntryId: 142
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,283+0000 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2895_2895, segmentEntryId: 136
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,296+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2885_2885, segmentEntryId: 126
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,312+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2853_2853, segmentEntryId: 129
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,335+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2860_2860, segmentEntryId: 149
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,361+0000 [BookKeeperClientWorker-OrderedExecutor-3-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2898_2898, segmentEntryId: 136
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,384+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2855_2855, segmentEntryId: 138
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,400+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2891_2891, segmentEntryId: 143
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,415+0000 [BookKeeperClientWorker-OrderedExecutor-3-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2862_2862, segmentEntryId: 150
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,428+0000 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2857_2857, segmentEntryId: 141
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,440+0000 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2858_2858, segmentEntryId: 149
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,635+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_3081_3081, segmentEntryId: 57
5月 21 02:15:17 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:17,539+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_3183_3183, segmentEntryId: 2
5月 21 02:15:22 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:22,204+0000 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2975_2975, segmentEntryId: 72
5月 21 02:15:23 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:23,873+0000 [prometheus-stats-43-4] INFO  org.eclipse.jetty.server.RequestLog - 172.17.225.89 - - [21/May/2024:02:15:23 +0000] "GET /metrics/ HTTP/1.1" 200 8144 "-" "Prometheus/2.41.0" 9
5月 21 02:15:23 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:23,977+0000 [BookKeeperClientWorker-OrderedExecutor-3-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_3186_3186, segmentEntryId: 2
5月 21 02:15:24 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:24,556+0000 [pulsar-io-3-2] INFO  org.apache.pulsar.broker.service.ServerCnx - [/172.17.32.50:56158] connected with clientVersion=Pulsar Go v0.12.1, clientProtocolVersion=18, proxyVersion=null

And metrics change image image

Sorry, maybe I cannot provide heap dump because of data safety

SennoYuki avatar May 21 '24 02:05 SennoYuki

https://github.com/apache/pulsar/issues/20638 https://github.com/apache/pulsar/issues/20099

Other people encountered similar problems. After cursor recovery, it seems that the cursor is simply reset to the position of the last ack submission. This will cause a large number of messages to be consumed repeatedly in the delay queue.

SennoYuki avatar May 21 '24 03:05 SennoYuki

see also https://github.com/apache/pulsar/issues/20099#issuecomment-2123050102

dao-jun avatar May 21 '24 17:05 dao-jun

I think first of all, we need to confirm whether markDelete/acknowledgment-holes has been successfully persistented.

In fact, Pulsar does not ensure the repeated delivery of messages that have been acked, unless you use transactions.

coderzc avatar Feb 26 '25 05:02 coderzc

    @FieldContext(
            category = CATEGORY_STORAGE_ML,
            doc = "Max number of `acknowledgment holes` that can be stored in MetadataStore.\n\n"
                    + "If number of unack message range is higher than this limit then broker will persist"
                    + " unacked ranges into bookkeeper to avoid additional data overhead into MetadataStore.")
    private int managedLedgerMaxUnackedRangesToPersistInMetadataStore = 1000;

@SennoYuki If the ack is discontinuous, Pulsar will record acknowledgment holes. However, by default, only 1000 acknowledgment holes will be persisted. Maybe you can try increasing the configuration above.

But please note that when the broker process crash, the MarkDelete / acknowledgment-holes may not have been persisted yet, so this is not a strong guarantee.

coderzc avatar Feb 26 '25 06:02 coderzc