dns-proxy-server icon indicating copy to clipboard operation
dns-proxy-server copied to clipboard

DNS stops responding

Open rayout opened this issue 1 year ago • 14 comments

What is Happening

DNS stops responding at random times. It just freezes. In this case, it can work for quite a long time before freezing

Specs

  • Docker version 26.1.3, build b72abbb
  • DPS Version:
    • defreitas/dns-proxy-server:3.24.0-snapshotproxy-server --version`
  • OS: [e.g. Ubuntu 24.04]

image_2024-07-12_11-47-15

image

rayout avatar Jul 12 '24 08:07 rayout

image

Today it froze again

rayout avatar Jul 15 '24 06:07 rayout

Hey @rayout I will need the full log to debug what is happening to cause this behavior, can you share it? docker logs ${CONTAINER_ID} &> logs.log, please enable TRACE log level to give more details.

Another question: From which version did you notice this behavior?

mageddo avatar Jul 15 '24 17:07 mageddo

I will keep using DPS to see if I also get the issue

mageddo avatar Jul 22 '24 12:07 mageddo

Sometimes (rarely) I also face the same (maybe different) issue (dns service stops answering) .

I have that in my logs (at 3.24.0 version):

Exception in thread "dnsjava NIO selector" java.lang.OutOfMemoryError: Garbage-collected heap size exceeded. Consider increasing the maximum Java heap size, for example with '-Xmx'.

dmekhov avatar Jul 30 '24 13:07 dmekhov

@dmekhov @rayout it can be related, DPS default heap size is set to 10m. You can test if increase the value fixes the issue for you by running

$ docker run  defreitas/dns-proxy-server:3.24.0-snapshot -XX:MaxHeapSize=50m -XX:MaxNewSize=10m

mageddo avatar Jul 30 '24 14:07 mageddo

You can test if increase the value fixes the issue for you by running

$ docker run  defreitas/dns-proxy-server:3.24.0-snapshot -XX:MaxHeapSize=50m -XX:MaxNewSize=10m

Can I use env variables to configure it? (JAVA_OPTS, JVM_OPTS, etc ?)

I'm using docker compose setup.

(for now I set it via the command property and will check if it helps)

dmekhov avatar Jul 30 '24 15:07 dmekhov

I'm afraid you can't use the JVM env to configure native image binaries, but you can use the command option at the docker-compose file

services:
  dps:
    image: defreitas/dns-proxy-server:3.24.0-snapshot
    command: -XX:MaxHeapSize=50m -XX:MaxNewSize=10m

mageddo avatar Jul 30 '24 15:07 mageddo

but you can use the command option at the docker-compose file

Yes, thanks, I use it now.

Ok, I'll be watching the result (but this error didn't happen often for me, so it can take a while)

dmekhov avatar Jul 30 '24 15:07 dmekhov

DPS got stuck today for me, I also got java.lang.OutOfMemoryError errr at the logs.

logs.txt

mageddo avatar Jul 30 '24 20:07 mageddo

FYI: Just released DPS 3.25.0, it increases resources utilization optimization, maybe it can fix the issue without the need of increasing the heap size.

mageddo avatar Jul 30 '24 20:07 mageddo

I reproduced the freezing scenario, reported by @rayout , it is different from the reported by @dmekhov , they are two different root causes causing the same behavior:

OutOfMemoryError

Scenario

When receiving a high number of requests considering the actual memory limits set, sometimes the heap exceed the size causing DPS freezing.

Solution

Optimizations were made at #436 version: 3.25.1

Increase Heap Size
services:
  dps:
    image: defreitas/dns-proxy-server:3.24.0-snapshot
    command: -XX:MaxHeapSize=50m -XX:MaxNewSize=10m
docker run  defreitas/dns-proxy-server:3.24.0-snapshot -XX:MaxHeapSize=50m -XX:MaxNewSize=10m

Random Freezing due deadlock

When receiving a high number of concurrent requests, the DPS cache can cause a deadlock, eventually locking all it's threads and freezing DPS

Solution

Fixes was made at #522, version 3.25.2

mageddo avatar Jul 31 '24 13:07 mageddo

A could have optimization will also be made at #524

mageddo avatar Jul 31 '24 13:07 mageddo

Can not check new version. Have this error:

dns-1  | Exception in thread "main" java.lang.IllegalStateException: SSMSA
dns-1  |        at com.github.benmanes.caffeine.cache.LocalCacheFactory.newFactory(LocalCacheFactory.java:114)
dns-1  |        at [email protected]/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1708)
dns-1  |        at com.github.benmanes.caffeine.cache.LocalCacheFactory.loadFactory(LocalCacheFactory.java:97)
dns-1  |        at com.github.benmanes.caffeine.cache.LocalCacheFactory.newBoundedLocalCache(LocalCacheFactory.java:46)
dns-1  |        at com.github.benmanes.caffeine.cache.BoundedLocalCache$BoundedLocalManualCache.<init>(BoundedLocalCache.java:3953)
dns-1  |        at com.github.benmanes.caffeine.cache.BoundedLocalCache$BoundedLocalManualCache.<init>(BoundedLocalCache.java:3949)
dns-1  |        at com.github.benmanes.caffeine.cache.Caffeine.build(Caffeine.java:1048)
dns-1  |        at com.mageddo.dnsproxyserver.solver.SolverCache.<init>(SolverCache.java:36)
dns-1  |        at com.mageddo.dnsproxyserver.di.module.ModuleSolver.remoteCache(ModuleSolver.java:41)
dns-1  |        at com.mageddo.dnsproxyserver.di.module.ModuleSolver_RemoteCacheFactory.remoteCache(ModuleSolver_RemoteCacheFactory.java:35)
dns-1  |        at com.mageddo.dnsproxyserver.di.module.ModuleSolver_RemoteCacheFactory.get(ModuleSolver_RemoteCacheFactory.java:27)
dns-1  |        at com.mageddo.dnsproxyserver.di.module.ModuleSolver_RemoteCacheFactory.get(ModuleSolver_RemoteCacheFactory.java:11)
dns-1  |        at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1  |        at com.mageddo.dnsproxyserver.solver.SolverCacheFactory_Factory.get(SolverCacheFactory_Factory.java:36)
dns-1  |        at com.mageddo.dnsproxyserver.solver.SolverCacheFactory_Factory.get(SolverCacheFactory_Factory.java:10)
dns-1  |        at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1  |        at com.mageddo.dnsproxyserver.solver.remote.dataprovider.SolverConsistencyGuaranteeDAOImpl_Factory.get(SolverConsistencyGuaranteeDAOImpl_Factory.java:34)
dns-1  |        at com.mageddo.dnsproxyserver.solver.remote.dataprovider.SolverConsistencyGuaranteeDAOImpl_Factory.get(SolverConsistencyGuaranteeDAOImpl_Factory.java:11)
dns-1  |        at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1  |        at com.mageddo.dnsproxyserver.solver.remote.application.CircuitBreakerFactory_Factory.get(CircuitBreakerFactory_Factory.java:42)
dns-1  |        at com.mageddo.dnsproxyserver.solver.remote.application.CircuitBreakerFactory_Factory.get(CircuitBreakerFactory_Factory.java:12)
dns-1  |        at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1  |        at com.mageddo.dnsproxyserver.solver.remote.application.CircuitBreakerFailSafeService_Factory.get(CircuitBreakerFailSafeService_Factory.java:33)
dns-1  |        at com.mageddo.dnsproxyserver.solver.remote.application.CircuitBreakerFailSafeService_Factory.get(CircuitBreakerFailSafeService_Factory.java:10)
dns-1  |        at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1  |        at com.mageddo.dnsproxyserver.solver.SolverRemote_Factory.get(SolverRemote_Factory.java:37)
dns-1  |        at com.mageddo.dnsproxyserver.solver.SolverRemote_Factory.get(SolverRemote_Factory.java:11)
dns-1  |        at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1  |        at com.mageddo.dnsproxyserver.solver.SolverCachedRemote_Factory.get(SolverCachedRemote_Factory.java:36)
dns-1  |        at com.mageddo.dnsproxyserver.solver.SolverCachedRemote_Factory.get(SolverCachedRemote_Factory.java:10)
dns-1  |        at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1  |        at com.mageddo.dnsproxyserver.di.module.ModuleSolver_SolversFactory.get(ModuleSolver_SolversFactory.java:50)
dns-1  |        at com.mageddo.dnsproxyserver.di.module.ModuleSolver_SolversFactory.get(ModuleSolver_SolversFactory.java:17)
dns-1  |        at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1  |        at dagger.internal.SetFactory.get(SetFactory.java:119)
dns-1  |        at dagger.internal.SetFactory.get(SetFactory.java:37)
dns-1  |        at com.mageddo.dnsproxyserver.di.module.ModuleSolver_SolversInstanceFactory.get(ModuleSolver_SolversInstanceFactory.java:36)
dns-1  |        at com.mageddo.dnsproxyserver.di.module.ModuleSolver_SolversInstanceFactory.get(ModuleSolver_SolversInstanceFactory.java:14)
dns-1  |        at com.mageddo.dnsproxyserver.solver.SolverProvider_Factory.get(SolverProvider_Factory.java:33)
dns-1  |        at com.mageddo.dnsproxyserver.solver.SolverProvider_Factory.get(SolverProvider_Factory.java:11)
dns-1  |        at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1  |        at dagger.internal.DelegateFactory.get(DelegateFactory.java:36)
dns-1  |        at com.mageddo.dnsproxyserver.server.dns.RequestHandlerDefault_Factory.get(RequestHandlerDefault_Factory.java:38)
dns-1  |        at com.mageddo.dnsproxyserver.server.dns.RequestHandlerDefault_Factory.get(RequestHandlerDefault_Factory.java:12)
dns-1  |        at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1  |        at com.mageddo.dnsserver.UDPServerPool_Factory.get(UDPServerPool_Factory.java:32)
dns-1  |        at com.mageddo.dnsserver.UDPServerPool_Factory.get(UDPServerPool_Factory.java:10)
dns-1  |        at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1  |        at com.mageddo.dnsserver.SimpleServer_Factory.get(SimpleServer_Factory.java:41)
dns-1  |        at com.mageddo.dnsserver.SimpleServer_Factory.get(SimpleServer_Factory.java:11)
dns-1  |        at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1  |        at com.mageddo.dnsproxyserver.server.dns.ServerStarter_Factory.get(ServerStarter_Factory.java:33)
dns-1  |        at com.mageddo.dnsproxyserver.server.dns.ServerStarter_Factory.get(ServerStarter_Factory.java:11)
dns-1  |        at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1  |        at com.mageddo.dnsproxyserver.server.Starter_Factory.get(Starter_Factory.java:43)
dns-1  |        at com.mageddo.dnsproxyserver.server.Starter_Factory.get(Starter_Factory.java:14)
dns-1  |        at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
dns-1  |        at com.mageddo.dnsproxyserver.di.DaggerContext$ContextImpl.starter(DaggerContext.java:384)
dns-1  |        at com.mageddo.dnsproxyserver.di.Context.start(Context.java:56)
dns-1  |        at com.mageddo.dnsproxyserver.App.startContext(App.java:65)
dns-1  |        at com.mageddo.dnsproxyserver.App.start(App.java:40)
dns-1  |        at com.mageddo.dnsproxyserver.App.main(App.java:25)
dns-1  |        at [email protected]/java.lang.invoke.LambdaForm$DMH/sa346b79c.invokeStaticInit(LambdaForm$DMH)
dns-1  | Caused by: java.lang.ClassNotFoundException: com.github.benmanes.caffeine.cache.SSMSA
dns-1  |        at org.graalvm.nativeimage.builder/com.oracle.svm.core.hub.ClassForNameSupport.forName(ClassForNameSupport.java:122)
dns-1  |        at org.graalvm.nativeimage.builder/com.oracle.svm.core.hub.ClassForNameSupport.forName(ClassForNameSupport.java:86)
dns-1  |        at [email protected]/java.lang.Class.forName(DynamicHub.java:1356)
dns-1  |        at [email protected]/java.lang.Class.forName(DynamicHub.java:1345)
dns-1  |        at [email protected]/java.lang.invoke.MethodHandles$Lookup.findClass(MethodHandles.java:2869)
dns-1  |        at com.github.benmanes.caffeine.cache.LocalCacheFactory.newFactory(LocalCacheFactory.java:104)
dns-1  |        ... 62 more

rayout avatar Aug 02 '24 18:08 rayout

Sorry for that, fixed on 3.25.10, can you check it? @rayout

mageddo avatar Aug 07 '24 03:08 mageddo

Thank you! I tested it for 2 weeks. Everything works great. After 14 days, it froze with the error: "Garbage-collected heap size exceeded. Consider increasing the maximum Java heap size." I am using version 3.25.10-snapshot. The startup settings are: "command: -XX:MaxHeapSize=50m -XX:MaxNewSize=10m."

rayout avatar Sep 11 '24 10:09 rayout

Thanks for your feedback, seems like the freezing scenario is fixed then.

Talking about the Heap Size, please keep calibrating to find an optimal setting, I can consider change the default value in the future.

mageddo avatar Sep 11 '24 12:09 mageddo

I think we can close the task. Thank you for help!

rayout avatar Sep 19 '24 09:09 rayout