[fix][Offload] fix indexEntries NullPointerException error
Motivation
Offloader exception occurs
21:54:32.360 [offloader-OrderedScheduler-14-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Unknown exception for ManagedLedgerException.
org.apache.bookkeeper.mledger.ManagedLedgerException$OffloadReadHandleClosedException: Offload read handle already closed
21:54:32.424 [offloader-OrderedScheduler-14-0] WARN org.apache.bookkeeper.common.util.SingleThreadSafeScheduledExecutorService - Unexpected throwable from task class org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreBackedReadHandleImpl$$Lambda$2013/0x00007fdd98b45d08: Cannot invoke "java.util.TreeMap.clear()" because "this.indexEntries" is null
java.lang.NullPointerException: Cannot invoke "java.util.TreeMap.clear()" because "this.indexEntries" is null
at org.apache.bookkeeper.mledger.offload.jcloud.impl.OffloadIndexBlockImpl.recycle(OffloadIndexBlockImpl.java:97) ~[?:?]
at org.apache.bookkeeper.mledger.offload.jcloud.impl.OffloadIndexBlockImpl.close(OffloadIndexBlockImpl.java:358) ~[?:?]
at org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreBackedReadHandleImpl.lambda$closeAsync$0(BlobStoreBackedReadHandleImpl.java:102) ~[?:?]
at org.apache.bookkeeper.common.util.SingleThreadSafeScheduledExecutorService$SafeRunnable.run(SingleThreadSafeScheduledExecutorService.java:46) ~[org.apache.bookkeeper-bookkeeper-common-4.16.3.jar:4.16.3]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.104.Final.jar:4.1.104.Final]
at java.lang.Thread.run(Thread.java:833) ~[?:?]
Modifications
check indexEntries is not null
Verifying this change
- [x] Make sure that the change passes the CI checks.
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
- Added integration tests for end-to-end deployment with large payloads (10MB)
- Extended integration test for recovery after broker failure
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [ ] The metrics
- [ ] Anything that affects deployment
Documentation
- [ ]
doc - [ ]
doc-required - [x]
doc-not-needed - [ ]
doc-complete
Matching PR in forked repository
PR in forked repository:
The
closemethod maybe be called more than once, so cause the issue. I think we can just letindexEntries=null, no need toclearit
@Technoboy- thanks. done
The
closemethod maybe be called more than once, so cause the issue. I think we can just letindexEntries=null, no need toclearit
Wouldn't that be a problem if the object instance gets recycled multiple times?
The
closemethod maybe be called more than once, so cause the issue. I think we can just letindexEntries=null, no need toclearitWouldn't that be a problem if the object instance gets recycled multiple times?
maybe
The
closemethod maybe be called more than once, so cause the issue. I think we can just letindexEntries=null, no need toclearitWouldn't that be a problem if the object instance gets recycled multiple times?
maybe
There have been bugs in the past with recycled objects that are caused by releasing the object multiple times.
The
closemethod maybe be called more than once, so cause the issue. I think we can just letindexEntries=null, no need toclearitWouldn't that be a problem if the object instance gets recycled multiple times?
maybe
There have been bugs in the past with recycled objects that are caused by releasing the object multiple times.
yes, but for this patch, it's ok to fix it like this, right?
The
closemethod maybe be called more than once, so cause the issue. I think we can just letindexEntries=null, no need toclearitWouldn't that be a problem if the object instance gets recycled multiple times?
maybe
There have been bugs in the past with recycled objects that are caused by releasing the object multiple times.
yes, but for this patch, it's ok to fix it like this, right?
I doubt that it's correct. The problem will become worse if it is "fixed" like this. I think that it is necessary to address the root cause.
The
closemethod maybe be called more than once, so cause the issue. I think we can just letindexEntries=null, no need toclearitWouldn't that be a problem if the object instance gets recycled multiple times?
maybe
There have been bugs in the past with recycled objects that are caused by releasing the object multiple times.
yes, but for this patch, it's ok to fix it like this, right?
@Technoboy- @graysonzeng I have shared more context in #22110 about the "double release" bug pattern.
it looks like ML read entries from an already closed ledger, read entries from a closed ledger will lead to exception, and then, ML will try to close the ledger again.
I believe the key point is ML trying to read entries from a closed ledger.
@graysonzeng could please provide the steps that I can reproduce the issue?
@graysonzeng could please provide the steps that I can reproduce the issue?
This is an occasional error that occurs only once. I can't try to reproduce it @dao-jun
@graysonzeng what's your pulsar version? could you please provide more log?
Closing Handle multiple times does indeed occur frequently, because ledger#closeAsync is called by async thread when read entries failed, if there are some ongoing read operations, they will also try to read the ledger which is already closed.
and LedgerHandle has also taken measures to address it(https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L537).
But offloader didn't, I'll create another PR to fix it.
related PR: https://github.com/apache/pulsar/pull/22162
Thanks for the fix, I'll close it once the related PR is merged @dao-jun
the pr closed automatically since https://github.com/apache/pulsar/pull/22162 merged.