dns-proxy-server icon indicating copy to clipboard operation
dns-proxy-server copied to clipboard

Add DNS caching time parameter for multiple servers

Open rayout opened this issue 1 year ago • 2 comments

I would like to suggest adding the ability to configure the caching time parameter when multiple DNS servers are available. For example, in my configuration, I have two DNS servers: 10.0.0.10 and 8.8.8.8. The first DNS (10.0.0.10) is used via VPN, and if it does not respond, the request will be sent to the second DNS (8.8.8.8). Currently, all subsequent requests go to the second DNS. However, I would like the system to switch back to resolving requests from the first DNS after a specified time.

Perhaps these 20 seconds should be made a configurable parameter: https://github.com/mageddo/dns-proxy-server/blob/17b0c0043d883cf9f902075739183c7938c808b2/src/main/java/com/mageddo/dnsproxyserver/server/dns/solver/SolverRemote.java#L174

This change would allow for more flexible management of DNS request caching and ensure more efficient operation with multiple DNS servers. I would appreciate your consideration of this proposal. Thank you!

rayout avatar Apr 25 '24 12:04 rayout

Alright, so by some reason 10.0.0.10 is failing, circuit is opening for that server and you would want to customize the circuitbreaker parameters, right?

Just for curiosity do you know what is the 10.0.0.10 failing reason?

mageddo avatar May 04 '24 22:05 mageddo

Yes, you are right. 10.0.0.10 is the DNS for VPN network. For corporate network resources. When i turn on computer, VPN is not yet connected. At this case DNS will excluded. That`s why i have to restart docker container after connection to the VPN.

rayout avatar May 05 '24 12:05 rayout

Hey, I'm releasing 3.18.1 right now, can you check it solves your usecase after the release finish? Check the jSON config docs of how to use it.

mageddo avatar May 23 '24 18:05 mageddo

@rayout

mageddo avatar May 23 '24 18:05 mageddo

Hello! Thank you for your help! I tried to check, but an error occurs during the launch (I tried both with my own config and by replacing the config with the one from the documentation).

Exception in thread "main" com.fasterxml.jackson.databind.exc.InvalidDefinitionException: Cannot construct instance of com.mageddo.dnsproxyserver.config.dataprovider.vo.ConfigJsonV2$SolverRemote$CircuitBreaker: cannot deserialize from Object value (no delegate- or property-based Creator): this appears to be a native image, in which case you may need to configure reflection for the class that is to be deserialized at [Source: UNKNOWN; byte offset: #UNKNOWN] (through reference chain: com.mageddo.dnsproxyserver.config.dataprovider.vo.ConfigJsonV2["solverRemote"]->com.mageddo.dnsproxyserver.config.dataprovider.vo.ConfigJsonV2$SolverRemote["circuitBreaker"])

rayout avatar May 24 '24 09:05 rayout

Hello! Thank you for your help! I tried to check, but an error occurs during the launch (I tried both with my own config and by replacing the config with the one from the documentation).

Thanks for the feedback, fixing that on #454 and releasing 3.18.2-snapshot

mageddo avatar May 24 '24 15:05 mageddo

I tested it. I specified two addresses as DNS - 10.0.0.10 and 8.8.8.8. The 10.0.0.10 DNS is only accessible via VPN.

I connected to the VPN and used the dig command to query an address that has an IP within the 10.0.0.10 range. The command was: dig @172.17.0.1 (this is the IP of the Docker where the DNS is listening). There were no issues, and the response was the address 10.0.0.169.

Next, I disconnected from the VPN and tried the dig command again multiple times. I still received the internal IP 10.0.0.169, even though the site has an external IP address on 8.8.8.8. I waited 10 minutes to check the cache, but I still received the internal address.

What could I have done wrong? If I run dig

@10.0.0.10, it results in an error because I am disconnected from the VPN. Therefore, it should have switched to 8.8.8.8 after some time, but it didn't work.

rayout avatar May 28 '24 12:05 rayout

I still received the internal IP 10.0.0.169, even though the site has an external IP address on 8.8.8.8. I waited 10 minutes to check the cache

I suppose this scenario it's related to the response entries cache, so it's a second scenario, not related to the circuitbreaker. Once query has a successful response then DPS will cache it for the time the remote server specifies, (10.0.0.10 in your case).

In the bellow example, 107 is the TTL in seconds.

$ dig +noall +nocmd +answer google.com
google.com.		107	IN	A	142.251.129.238

You can make the same test and manage the cache with the APIs bellow:

See how's the cache

$ curl localhost:5385/v1/caches/

Clear a cache

$ curl -X DELETE localhost:5385/v1/caches?name=REMOTE

@rayout

mageddo avatar May 29 '24 11:05 mageddo

I'm thinking about what can be done in this scenario.

mageddo avatar May 29 '24 11:05 mageddo

Please confirm if it really is the scenario

mageddo avatar May 29 '24 11:05 mageddo

Enable trace debug level and post the output when you test the same scenario again plus the cache clear scenario will help me.

mageddo avatar May 29 '24 12:05 mageddo

Yes, the problem is indeed with the cache. When looking at /v1/caches/ (thanks, very useful), I can see two entries for the domain I need:

global: ttl": "PT20S" remote: "ttl": "PT1H" With the VPN disconnected, I run dig +noall +nocmd +answer @172.17.0.1 and get "nothing" because 8.8.8.8 does not have a record for this domain. Then I connect to the VPN, run the same command, and no matter how long I wait (more than 20 seconds), I do not get an IP or a TTL.

Here is what I see in the console during debugging:

  1. VPN disconnected:

image

  1. VPN connected:

image

  1. Cache cleared:

image

rayout avatar May 29 '24 13:05 rayout

I’m not really sure I’m on the right path. In Ubuntu 14-16, if two DNS servers were specified in the settings, the system would query the second DNS if the first didn’t respond. Something similar is described here - systemd issue #5755. Maybe I’m mistaken, but that’s how it felt.

It’s also possible that connecting to the VPN used to clear the DNS cache, which no longer happens.

Here’s my current scenario: I have containers to which I assign hostnames in the configuration, and thanks to this wonderful project, I can refer to them by name instead of IP. I also have a VPN to the corporate network with many internal addresses (I’ve counted about 10 domains so far). Previously, all requests would go through the corporate DNS when connected to the VPN. This somewhat worked, but some addresses with external IPs were cached to the external address used for acme https, and I had to manually add them to /etc/hosts or the service’s admin panel.

A week ago, I tried to solve this problem, and now I have two Docker containers. The first is with this project, and the second with dnsmasq, configured as follows:

image

image

I also had to set ipv4.dns-search: "~." in /etc/netplan/vpn.yaml and ip4.dns: "172.17.0.1" (dnsmasq address), so that this DNS is applied when connecting to the VPN.

This scheme works more or less well. Only requests for the domains I specified go to the corporate DNS. This is also a drawback, as I need to maintain the list of these domains. Another drawback is that I can only access the Docker containers when the VPN is on.

Is there a simpler or better solution?

rayout avatar May 29 '24 13:05 rayout

The NXDOMAIN response (no answer) will be cached for one hour.

Yes, the problem is indeed with the cache. When looking at /v1/caches/ (thanks, very useful), I can see two entries for the domain I need:

global: ttl": "PT20S" remote: "ttl": "PT1H" With the VPN disconnected, I run dig +noall +nocmd +answer @172.17.0.1 and get "nothing" because 8.8.8.8 does not have a record for this domain. Then I connect to the VPN, run the same command, and no matter how long I wait (more than 20 seconds), I do not get an IP or a TTL.

Here is what I see in the console during debugging:

mageddo avatar May 29 '24 17:05 mageddo

I’m not really sure I’m on the right path. In Ubuntu 14-16, if two DNS servers were specified in the settings, the system would query the second DNS if the first didn’t respond. Something similar is described here - https://github.com/systemd/systemd/issues/5755. Maybe I’m mistaken, but that’s how it felt.

It looks like the same behavior, but the reason is another. I identified a new improvement to be done, I've done some definition for this new improvement at #455. It probably would fix your bad experience, what do you think?

mageddo avatar May 29 '24 17:05 mageddo

and thanks to this wonderful project

Appreciate your thanks, you're welcome!

Is there a simpler or better solution?

Actually I consider we can fix that by #455 or/and by creating a toggle for the entire cache. Despite this, DPS on version 3.12.1 have not this cache feature, then you can test it and check if has the behavior you want (not sure this version is stable for use but maybe it can make the job for now), this way we can validate the wanted behavior before I implement these two solutions on the new version. The two solutions aren't too big though.

The cache was implemented at 3.13.1 see the release notes.

@rayout

mageddo avatar May 29 '24 17:05 mageddo

Create a watch dog to keep testing the remote servers circuit, it will clear the cache whenever a remote server goes down or gets health again.

I think this is a great idea and it will really solve the problem.

rayout avatar Jun 03 '24 08:06 rayout

@rayout 3.19.0 is out implementing the watchdog, can you check it solves your usecase? Also consider calibrate your circuit breaker params.

mageddo avatar Jun 04 '24 19:06 mageddo

I'm closing this issue, feel free to reopen if the issue persists

mageddo avatar Jun 22 '24 00:06 mageddo

Thank you! I have been testing for some time to wait for feedback, and then went on vacation. Everything works fine now. When turning VPN on and off, everything works instantly. But sometimes, if I have been without a VPN connection for a long time, when I turn on the VPN, any DNS requests stop working altogether until I completely restart the docker container with the DNS proxy. I will continue testing and will create a task if I catch this issue.

rayout avatar Jul 02 '24 08:07 rayout

Thank you! I have been testing for some time to wait for feedback, and then went on vacation. Everything works fine now. When turning VPN on and off, everything works instantly.

Glad we had an advance!

But sometimes, if I have been without a VPN connection for a long time, when I turn on the VPN, any DNS requests stop working altogether until I completely restart the docker container with the DNS proxy. I will continue testing and will create a task if I catch this issue.

Sure, thanks for your contribution @rayout

mageddo avatar Jul 02 '24 14:07 mageddo