herddb icon indicating copy to clipboard operation
herddb copied to clipboard

Inefficient recovery of BRIN indexes

Open eolivelli opened this issue 6 years ago • 2 comments

Another flavour of #127

during recovery of a large table with BRIN indexes we can see BRIN index checkpoints, with an fsync (force0)

hdb-mucca-activator" #29 daemon prio=5 os_prio=0 cpu=731142,82ms elapsed=1803,92s tid=0x00007f5a5d51c800 nid=0x168ba runnable  [0x00007f59d58d4000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.FileDispatcherImpl.force0([email protected]/Native Method)
        at sun.nio.ch.FileDispatcherImpl.force([email protected]/FileDispatcherImpl.java:82)
        at sun.nio.ch.FileChannelImpl.force([email protected]/FileChannelImpl.java:461)
        at herddb.utils.ManagedFile.sync(ManagedFile.java:115)
        at herddb.file.FileDataStorageManager.writeIndexPage(FileDataStorageManager.java:910)
        at herddb.file.FileDataStorageManager.writeIndexPage(FileDataStorageManager.java:938)
        at herddb.index.brin.BRINIndexManager$IndexDataStorageImpl.createDataPage(BRINIndexManager.java:470)
        at herddb.index.brin.BlockRangeIndex$Block.checkpointNoLock(BlockRangeIndex.java:665)
        at herddb.index.brin.BlockRangeIndex$Block.checkpoint(BlockRangeIndex.java:679)
        at herddb.index.brin.BlockRangeIndex$Block.unloadNoLock(BlockRangeIndex.java:691)
        at herddb.index.brin.BlockRangeIndex$Block.unload(BlockRangeIndex.java:702)
        at herddb.index.brin.BlockRangeIndex$Block.unload(BlockRangeIndex.java:730)
        at herddb.index.brin.BlockRangeIndex$Block.ensureBlockLoaded(BlockRangeIndex.java:499)
        at herddb.index.brin.BlockRangeIndex$Block.delete(BlockRangeIndex.java:380)
        at herddb.index.brin.BlockRangeIndex.delete(BlockRangeIndex.java:1055)
        at herddb.index.brin.BRINIndexManager.recordDeleted(BRINIndexManager.java:421)
        at herddb.core.TableManager.applyDelete(TableManager.java:1677)
        at herddb.core.TableManager.onTransactionCommit(TableManager.java:1467)
        at herddb.core.TableSpaceManager.apply(TableSpaceManager.java:368)
        at herddb.core.TableSpaceManager$ApplyEntryOnRecovery.accept(TableSpaceManager.java:1858)
        at herddb.core.TableSpaceManager$ApplyEntryOnRecovery.accept(TableSpaceManager.java:1847)
        at herddb.cluster.BookkeeperCommitLog.recovery(BookkeeperCommitLog.java:477)
        at herddb.core.TableSpaceManager.recover(TableSpaceManager.java:299)

eolivelli avatar Jan 15 '20 12:01 eolivelli

In the log you can see pauses of about 21 seconds like:

20-01-15-13-27-09	herddb.cluster.BookkeeperCommitLog Jan 15, 2020 1:27:09 PM herddb.cluster.BookkeeperCommitLog recovery
INFO: q2 (36f8b443df7e459682d26c97246d47dd) From entry 72,077,207, to entry 72,087,207 (95.85 %) read time 21085 ms

20-01-15-13-27-09	herddb.cluster.BookkeeperCommitLog Jan 15, 2020 1:27:09 PM herddb.cluster.BookkeeperCommitLog recovery
INFO: q2 (36f8b443df7e459682d26c97246d47dd) From entry 72,087,208, to entry 72,097,208 (95.863 %) read time 93 ms

20-01-15-13-27-24	herddb.cluster.BookkeeperCommitLog Jan 15, 2020 1:27:24 PM herddb.cluster.BookkeeperCommitLog recovery
INFO: q2 (36f8b443df7e459682d26c97246d47dd) From entry 72,097,209, to entry 72,107,209 (95.877 %) read time 14745 ms

20-01-15-13-27-24	herddb.cluster.BookkeeperCommitLog Jan 15, 2020 1:27:24 PM herddb.cluster.BookkeeperCommitLog recovery
INFO: q2 (36f8b443df7e459682d26c97246d47dd) From entry 72,107,210, to entry 72,117,210 (95.89 %) read time 72 ms

20-01-15-13-27-24	herddb.cluster.BookkeeperCommitLog Jan 15, 2020 1:27:24 PM herddb.cluster.BookkeeperCommitLog recovery
INFO: q2 (36f8b443df7e459682d26c97246d47dd) From entry 72,117,211, to entry 72,127,211 (95.903 %) read time 90 ms



20-01-15-13-27-45	herddb.cluster.BookkeeperCommitLog Jan 15, 2020 1:27:45 PM herddb.cluster.BookkeeperCommitLog recovery
INFO: q2 (36f8b443df7e459682d26c97246d47dd) From entry 72,127,212, to entry 72,137,212 (95.917 %) read time 21274 ms

20-01-15-13-27-45	herddb.cluster.BookkeeperCommitLog Jan 15, 2020 1:27:45 PM herddb.cluster.BookkeeperCommitLog recovery
INFO: q2 (36f8b443df7e459682d26c97246d47dd) From entry 72,137,213, to entry 72,147,213 (95.93 %) read time 90 ms

20-01-15-13-27-45	herddb.cluster.BookkeeperCommitLog Jan 15, 2020 1:27:45 PM herddb.cluster.BookkeeperCommitLog recovery
INFO: q2 (36f8b443df7e459682d26c97246d47dd) From entry 72,147,214, to entry 72,157,214 (95.943 %) read time 86 ms

eolivelli avatar Jan 15 '20 12:01 eolivelli

In such case the brin index need to unload a page with dirty record so it has to flush changed date to disk. The same procedure is invoked on unload and on index checkpoint. Force has to be true to be sure to have really written data.

Practically speaking it wouldn't need a force if not during a real index checkpoint but once flushed with no force a page I don't know hot to ensure that has been really written during a later checkpoint

diegosalvi avatar Jan 15 '20 12:01 diegosalvi