Apache Ignite stops working after running for a week
I am using Apache Ignite (2.15.0) in .NET6 project to run Ignite cache on a single partition node. It was running fine without any issues but after 7 days of runtime it stopped automatically and when I try to start Ignite cache I see below errors in log file.
The only workaround I found is to delete "Cache-Storage" folder (mentioned in the below error details) and start Ignite cache, then it works fine. Any help in resolving this issue will be greatly appreciated. Thank you!
[10:20:11,026][INFO][main][CheckpointMarkersStorage] Read checkpoint status [startMarker=C:\Cache-Storage\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\cp\1726231973291-a70928e6-62e4-4152-96e6-e03ceb7493d4-START.bin, endMarker=C:\Cache-Storage\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\cp\1726231973291-a70928e6-62e4-4152-96e6-e03ceb7493d4-END.bin]
[10:20:11,026][INFO][main][PageMemoryImpl] Started page memory [memoryAllocated=100.0 MiB, pages=24804, tableSize=1.9 MiB, replacementSize=3.1 KiB, checkpointBuffer=100.0 MiB]
[10:20:11,026][INFO][main][GridCacheDatabaseSharedManager] Checking memory state [lastValidPos=WALPointer [idx=103, fileOff=46331083, len=50617], lastMarked=WALPointer [idx=103, fileOff=46331083, len=50617], lastCheckpointId=a70928e6-62e4-4152-96e6-e03ceb7493d4]
[10:20:11,026][INFO][main][FilePageStoreManager] Cleanup cache stores [total=0, left=0, cleanFiles=false]
[10:20:11,026][SEVERE][main][IgniteKernal] Exception during start processors, node will be stopped and close connections
class org.apache.ignite.IgniteCheckedException: WAL history is too short [descs=[FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000118.wal, idx=118], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000119.wal, idx=119], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000120.wal, idx=120], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000121.wal, idx=121], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000122.wal, idx=122], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000123.wal, idx=123], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000124.wal, idx=124], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000125.wal, idx=125]], start=WALPointer [idx=103, fileOff=46331083, len=50617]]
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.init(FileWriteAheadLogManager.java:3039)
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.access$1000(FileWriteAheadLogManager.java:2911)
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:1082)
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:1049)
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.read(FileWriteAheadLogManager.java:1037)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.performBinaryMemoryRestore(GridCacheDatabaseSharedManager.java:2119)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:865)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3094)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1120)
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1725)
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1647)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1089)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:599)
at org.apache.ignite.internal.processors.platform.PlatformAbstractBootstrap.start(PlatformAbstractBootstrap.java:43)
at org.apache.ignite.internal.processors.platform.PlatformIgnition.start(PlatformIgnition.java:74)
[10:20:11,026][SEVERE][main][IgniteKernal] Got exception while starting (will rollback startup routine).
class org.apache.ignite.IgniteCheckedException: WAL history is too short [descs=[FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000118.wal, idx=118], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000119.wal, idx=119], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000120.wal, idx=120], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000121.wal, idx=121], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000122.wal, idx=122], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000123.wal, idx=123], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000124.wal, idx=124], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000125.wal, idx=125]], start=WALPointer [idx=103, fileOff=46331083, len=50617]]
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.init(FileWriteAheadLogManager.java:3039)
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.access$1000(FileWriteAheadLogManager.java:2911)
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:1082)
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:1049)
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.read(FileWriteAheadLogManager.java:1037)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.performBinaryMemoryRestore(GridCacheDatabaseSharedManager.java:2119)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:865)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3094)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1120)
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1725)
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1647)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1089)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:599)
at org.apache.ignite.internal.processors.platform.PlatformAbstractBootstrap.start(PlatformAbstractBootstrap.java:43)
at org.apache.ignite.internal.processors.platform.PlatformIgnition.start(PlatformIgnition.java:74)
[10:20:11,041][WARNING][main][IgniteKernal] Attempt to stop starting grid. This operation cannot be guaranteed to be successful.
[10:20:11,045][INFO][main][GridTcpRestProtocol] Command protocol successfully stopped: TCP binary
[10:20:11,045][INFO][main][FilePageStoreManager] Cleanup cache stores [total=0, left=0, cleanFiles=false]
[10:20:11,059][INFO][main][IgniteKernal]
>>> +----------------------------------------------------------------------------------+
>>> Ignite ver. 2.15.0#20230425-sha1:f98f7f35de6dc76a9b69299154afaa2139a5ec6d stopped OK
>>> +----------------------------------------------------------------------------------+
>>> Grid uptime: 00:00:03.654
@vinaygangaraj , it seems, that some files was deleted. Can you attach log of initial node failure?
@shishkovilja, I couldn't locate previous logs, but a similar issue occurred recently after some Windows updates. Here are the details:
The Ignite cache was functioning normally without any issues. However, following recent Windows updates on one of the machines, the Ignite cache was affected during the system restart. When the machine rebooted, the Ignite cache also shut down and encountered some errors in the process. I’ve saved the error details in a file named "ignite-4b92db2a.0-File1," ignite-4b92db2a.0-File1.log which contains the errors that occurred during the Ignite cache shutdown.
After the machine rebooted, the Ignite cache attempted to start but failed to do so. The error details are documented in "ignite-8c67398e.0-File2." ignite-8c67398e.0-File2.log The only solution I found was to delete the "Cache-Storage" folder (referenced in the error details below) and restart the Ignite cache, after which it functioned correctly.
Windows updates: https://support.microsoft.com/en-us/topic/august-13-2024-kb5041580-os-builds-19044-4780-and-19045-4780-2ef55b0d-bb01-41c8-8629-4146929792ad https://support.microsoft.com/en-us/topic/august-13-2024-kb5042352-cumulative-update-for-net-framework-3-5-4-8-and-4-8-1-for-windows-10-version-22h2-4716cff5-6c43-4d1b-a4c3-4b517fa8898a