DNS stops responding
What is Happening
DNS stops responding at random times. It just freezes. In this case, it can work for quite a long time before freezing
Specs
- Docker version 26.1.3, build b72abbb
- DPS Version:
- defreitas/dns-proxy-server:3.24.0-snapshotproxy-server --version`
- OS: [e.g. Ubuntu 24.04]
Today it froze again
Hey @rayout I will need the full log to debug what is happening to cause this behavior, can you share it? docker logs ${CONTAINER_ID} &> logs.log, please enable TRACE log level to give more details.
Another question: From which version did you notice this behavior?
I will keep using DPS to see if I also get the issue
Sometimes (rarely) I also face the same (maybe different) issue (dns service stops answering) .
I have that in my logs (at 3.24.0 version):
Exception in thread "dnsjava NIO selector" java.lang.OutOfMemoryError: Garbage-collected heap size exceeded. Consider increasing the maximum Java heap size, for example with '-Xmx'.
@dmekhov @rayout it can be related, DPS default heap size is set to 10m. You can test if increase the value fixes the issue for you by running
$ docker run defreitas/dns-proxy-server:3.24.0-snapshot -XX:MaxHeapSize=50m -XX:MaxNewSize=10m
You can test if increase the value fixes the issue for you by running
$ docker run defreitas/dns-proxy-server:3.24.0-snapshot -XX:MaxHeapSize=50m -XX:MaxNewSize=10m
Can I use env variables to configure it? (JAVA_OPTS, JVM_OPTS, etc ?)
I'm using docker compose setup.
(for now I set it via the command property and will check if it helps)
I'm afraid you can't use the JVM env to configure native image binaries, but you can use the command option at the docker-compose file
services:
dps:
image: defreitas/dns-proxy-server:3.24.0-snapshot
command: -XX:MaxHeapSize=50m -XX:MaxNewSize=10m
but you can use the
commandoption at the docker-compose file
Yes, thanks, I use it now.
Ok, I'll be watching the result (but this error didn't happen often for me, so it can take a while)
FYI: Just released DPS 3.25.0, it increases resources utilization optimization, maybe it can fix the issue without the need of increasing the heap size.
I reproduced the freezing scenario, reported by @rayout , it is different from the reported by @dmekhov , they are two different root causes causing the same behavior:
OutOfMemoryError
Scenario
When receiving a high number of requests considering the actual memory limits set, sometimes the heap exceed the size causing DPS freezing.
Solution
Optimizations were made at #436 version: 3.25.1
Increase Heap Size
services:
dps:
image: defreitas/dns-proxy-server:3.24.0-snapshot
command: -XX:MaxHeapSize=50m -XX:MaxNewSize=10m
docker run defreitas/dns-proxy-server:3.24.0-snapshot -XX:MaxHeapSize=50m -XX:MaxNewSize=10m
Random Freezing due deadlock
When receiving a high number of concurrent requests, the DPS cache can cause a deadlock, eventually locking all it's threads and freezing DPS
Solution
Fixes was made at #522, version 3.25.2
A could have optimization will also be made at #524
Can not check new version. Have this error:
dns-1 | Exception in thread "main" java.lang.IllegalStateException: SSMSA
dns-1 | at com.github.benmanes.caffeine.cache.LocalCacheFactory.newFactory(LocalCacheFactory.java:114)
dns-1 | at [email protected]/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1708)
dns-1 | at com.github.benmanes.caffeine.cache.LocalCacheFactory.loadFactory(LocalCacheFactory.java:97)
dns-1 | at com.github.benmanes.caffeine.cache.LocalCacheFactory.newBoundedLocalCache(LocalCacheFactory.java:46)
dns-1 | at com.github.benmanes.caffeine.cache.BoundedLocalCache$BoundedLocalManualCache.<init>(BoundedLocalCache.java:3953)
dns-1 | at com.github.benmanes.caffeine.cache.BoundedLocalCache$BoundedLocalManualCache.<init>(BoundedLocalCache.java:3949)
dns-1 | at com.github.benmanes.caffeine.cache.Caffeine.build(Caffeine.java:1048)
dns-1 | at com.mageddo.dnsproxyserver.solver.SolverCache.<init>(SolverCache.java:36)
dns-1 | at com.mageddo.dnsproxyserver.di.module.ModuleSolver.remoteCache(ModuleSolver.java:41)
dns-1 | at com.mageddo.dnsproxyserver.di.module.ModuleSolver_RemoteCacheFactory.remoteCache(ModuleSolver_RemoteCacheFactory.java:35)
dns-1 | at com.mageddo.dnsproxyserver.di.module.ModuleSolver_RemoteCacheFactory.get(ModuleSolver_RemoteCacheFactory.java:27)
dns-1 | at com.mageddo.dnsproxyserver.di.module.ModuleSolver_RemoteCacheFactory.get(ModuleSolver_RemoteCacheFactory.java:11)
dns-1 | at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1 | at com.mageddo.dnsproxyserver.solver.SolverCacheFactory_Factory.get(SolverCacheFactory_Factory.java:36)
dns-1 | at com.mageddo.dnsproxyserver.solver.SolverCacheFactory_Factory.get(SolverCacheFactory_Factory.java:10)
dns-1 | at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1 | at com.mageddo.dnsproxyserver.solver.remote.dataprovider.SolverConsistencyGuaranteeDAOImpl_Factory.get(SolverConsistencyGuaranteeDAOImpl_Factory.java:34)
dns-1 | at com.mageddo.dnsproxyserver.solver.remote.dataprovider.SolverConsistencyGuaranteeDAOImpl_Factory.get(SolverConsistencyGuaranteeDAOImpl_Factory.java:11)
dns-1 | at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1 | at com.mageddo.dnsproxyserver.solver.remote.application.CircuitBreakerFactory_Factory.get(CircuitBreakerFactory_Factory.java:42)
dns-1 | at com.mageddo.dnsproxyserver.solver.remote.application.CircuitBreakerFactory_Factory.get(CircuitBreakerFactory_Factory.java:12)
dns-1 | at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1 | at com.mageddo.dnsproxyserver.solver.remote.application.CircuitBreakerFailSafeService_Factory.get(CircuitBreakerFailSafeService_Factory.java:33)
dns-1 | at com.mageddo.dnsproxyserver.solver.remote.application.CircuitBreakerFailSafeService_Factory.get(CircuitBreakerFailSafeService_Factory.java:10)
dns-1 | at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1 | at com.mageddo.dnsproxyserver.solver.SolverRemote_Factory.get(SolverRemote_Factory.java:37)
dns-1 | at com.mageddo.dnsproxyserver.solver.SolverRemote_Factory.get(SolverRemote_Factory.java:11)
dns-1 | at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1 | at com.mageddo.dnsproxyserver.solver.SolverCachedRemote_Factory.get(SolverCachedRemote_Factory.java:36)
dns-1 | at com.mageddo.dnsproxyserver.solver.SolverCachedRemote_Factory.get(SolverCachedRemote_Factory.java:10)
dns-1 | at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1 | at com.mageddo.dnsproxyserver.di.module.ModuleSolver_SolversFactory.get(ModuleSolver_SolversFactory.java:50)
dns-1 | at com.mageddo.dnsproxyserver.di.module.ModuleSolver_SolversFactory.get(ModuleSolver_SolversFactory.java:17)
dns-1 | at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1 | at dagger.internal.SetFactory.get(SetFactory.java:119)
dns-1 | at dagger.internal.SetFactory.get(SetFactory.java:37)
dns-1 | at com.mageddo.dnsproxyserver.di.module.ModuleSolver_SolversInstanceFactory.get(ModuleSolver_SolversInstanceFactory.java:36)
dns-1 | at com.mageddo.dnsproxyserver.di.module.ModuleSolver_SolversInstanceFactory.get(ModuleSolver_SolversInstanceFactory.java:14)
dns-1 | at com.mageddo.dnsproxyserver.solver.SolverProvider_Factory.get(SolverProvider_Factory.java:33)
dns-1 | at com.mageddo.dnsproxyserver.solver.SolverProvider_Factory.get(SolverProvider_Factory.java:11)
dns-1 | at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1 | at dagger.internal.DelegateFactory.get(DelegateFactory.java:36)
dns-1 | at com.mageddo.dnsproxyserver.server.dns.RequestHandlerDefault_Factory.get(RequestHandlerDefault_Factory.java:38)
dns-1 | at com.mageddo.dnsproxyserver.server.dns.RequestHandlerDefault_Factory.get(RequestHandlerDefault_Factory.java:12)
dns-1 | at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1 | at com.mageddo.dnsserver.UDPServerPool_Factory.get(UDPServerPool_Factory.java:32)
dns-1 | at com.mageddo.dnsserver.UDPServerPool_Factory.get(UDPServerPool_Factory.java:10)
dns-1 | at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1 | at com.mageddo.dnsserver.SimpleServer_Factory.get(SimpleServer_Factory.java:41)
dns-1 | at com.mageddo.dnsserver.SimpleServer_Factory.get(SimpleServer_Factory.java:11)
dns-1 | at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1 | at com.mageddo.dnsproxyserver.server.dns.ServerStarter_Factory.get(ServerStarter_Factory.java:33)
dns-1 | at com.mageddo.dnsproxyserver.server.dns.ServerStarter_Factory.get(ServerStarter_Factory.java:11)
dns-1 | at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1 | at com.mageddo.dnsproxyserver.server.Starter_Factory.get(Starter_Factory.java:43)
dns-1 | at com.mageddo.dnsproxyserver.server.Starter_Factory.get(Starter_Factory.java:14)
dns-1 | at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1 | at com.mageddo.dnsproxyserver.di.DaggerContext$ContextImpl.starter(DaggerContext.java:384)
dns-1 | at com.mageddo.dnsproxyserver.di.Context.start(Context.java:56)
dns-1 | at com.mageddo.dnsproxyserver.App.startContext(App.java:65)
dns-1 | at com.mageddo.dnsproxyserver.App.start(App.java:40)
dns-1 | at com.mageddo.dnsproxyserver.App.main(App.java:25)
dns-1 | at [email protected]/java.lang.invoke.LambdaForm$DMH/sa346b79c.invokeStaticInit(LambdaForm$DMH)
dns-1 | Caused by: java.lang.ClassNotFoundException: com.github.benmanes.caffeine.cache.SSMSA
dns-1 | at org.graalvm.nativeimage.builder/com.oracle.svm.core.hub.ClassForNameSupport.forName(ClassForNameSupport.java:122)
dns-1 | at org.graalvm.nativeimage.builder/com.oracle.svm.core.hub.ClassForNameSupport.forName(ClassForNameSupport.java:86)
dns-1 | at [email protected]/java.lang.Class.forName(DynamicHub.java:1356)
dns-1 | at [email protected]/java.lang.Class.forName(DynamicHub.java:1345)
dns-1 | at [email protected]/java.lang.invoke.MethodHandles$Lookup.findClass(MethodHandles.java:2869)
dns-1 | at com.github.benmanes.caffeine.cache.LocalCacheFactory.newFactory(LocalCacheFactory.java:104)
dns-1 | ... 62 more
Sorry for that, fixed on 3.25.10, can you check it? @rayout
Thank you! I tested it for 2 weeks. Everything works great. After 14 days, it froze with the error: "Garbage-collected heap size exceeded. Consider increasing the maximum Java heap size." I am using version 3.25.10-snapshot. The startup settings are: "command: -XX:MaxHeapSize=50m -XX:MaxNewSize=10m."
Thanks for your feedback, seems like the freezing scenario is fixed then.
Talking about the Heap Size, please keep calibrating to find an optimal setting, I can consider change the default value in the future.
I think we can close the task. Thank you for help!