dnsproxy I can't get fallback to do anything useful

I shutdown the primary server and none of the servers specified with -f were used, even though they were available and working.

Mar 25 '25 20:03 abcbarryn

I had additional servers configured with -f, it never tried them. :(

2025/03/25 16:34:26.460588 ERROR exchange failed prefix=dnsproxy upstream=https://dns.amobiledevice.com:443/dns-query question=";star2.abcm.com.s2.abcm.com.\tIN\t A" duration=2.0610494s err="requesting https://dns.amobiledevice.com:443/dns-query: Get "https://dns.amobiledevice.com:443/dns-query?dns=AAABAAABAAAAAAAABXN0YXIyBGFiY20DY29tAnMyBGFiY20DY29tAAABAAE": dial tcp 23.115.249.137:443: connectex: No connection could be made because the target machine actively refused it." 2025/03/25 16:34:28.471245 ERROR exchange failed prefix=dnsproxy upstream=https://dns.amobiledevice.com:443/dns-query question=";star2.abcm.com.s2.abcm.com.\tIN\t AAAA" duration=2.0610205s err="requesting https://dns.amobiledevice.com:443/dns-query: Get "https://dns.amobiledevice.com:443/dns-query?dns=AAABAAABAAAAAAAABXN0YXIyBGFiY20DY29tAnMyBGFiY20DY29tAAAcAAE": dial tcp 23.115.249.137:443: connectex: No connection could be made because the target machine actively refused it." 2025/03/25 16:34:30.475283 ERROR exchange failed prefix=dnsproxy upstream=https://dns.amobiledevice.com:443/dns-query question=";star2.abcm.com.\tIN\t A" duration=2.0538427s err="requesting https://dns.amobiledevice.com:443/dns-query: Get "https://dns.amobiledevice.com:443/dns-query?dns=AAABAAABAAAAAAAABXN0YXIyBGFiY20DY29tAAABAAE": dial tcp 23.115.249.137:443: connectex: No connection could be made because the target machine actively refused it." 2025/03/25 16:34:32.483016 ERROR exchange failed prefix=dnsproxy upstream=https://dns.amobiledevice.com:443/dns-query question=";star2.abcm.com.\tIN\t AAAA" duration=2.0583346s err="requesting https://dns.amobiledevice.com:443/dns-query: Get "https://dns.amobiledevice.com:443/dns-query?dns=AAABAAABAAAAAAAABXN0YXIyBGFiY20DY29tAAAcAAE": dial tcp 23.115.249.137:443: connectex: No connection could be made because the target machine actively refused it."

Mar 25 '25 20:03 abcbarryn

"C:\Program Files\dnsproxy\dnsproxy.exe" --hosts-file-enabled -u https://dns.amobiledevice.com/dns-query -f https://doh.umbrella.com/dns-query -f 208.67.222.222:53 -f 208.67.220.220:53

Mar 25 '25 20:03 abcbarryn

I verified specifying the same servers directly with -u works. dnsproxy -u 208.67.222.222:53 -u 208.67.220.220:53

Mar 25 '25 20:03 abcbarryn

Anybody there?!? :)

Mar 26 '25 18:03 abcbarryn

I also noticed that the --fallback option does not seem to work.

I have a single upstream server configured and a single fallback configured. But when the upstream goes out, the fallback does not take over after the timeout, instead logging lines like the following over and over:

Apr 01 23:24:32 time.example.com dnsproxy[1349]: 2025/04/01 23:24:32.416281 ERROR exchange failed prefix=dnsproxy upstream=quic://bouncer.example.com:853 question=";youtubei.googleapis.com.\tIN\t AAAA" duration=9.922098174s err="getting conn: dialing quic connection to quic://bouncer.example.com:853: timeout: no recent network activity"

I am running the latest version from the master branch, compiled with go1.24.1 linux/arm64.

Config is as follows (with upstream name and edns-addr changed for privacy):

listen-addrs:
  - "::"
  - "0.0.0.0"
listen-ports:
  - 53
max-go-routines: 0
ratelimit: 0
ratelimit-subnet-len-ipv4: 24
ratelimit-subnet-len-ipv6: 64
udp-buf-size: 8388608
upstream:
  - "quic://bouncer.example.com"
fallback:
  - "quic://unfiltered.adguard-dns.com"
timeout: '10s'
edns: true
edns-addr: "2001:db8:45ba:e5c8:e975:2017:c773:27df"
verbose: false
cache: true
cache-size: 104857600

Apr 02 '25 05:04 jhed9

I hope someone looks at this and fixes it.

Apr 02 '25 06:04 abcbarryn

I’ll add that I have both my upstream and fallback dns names in my operating system’s hosts file, so I shouldn’t need a bootstrap option.

Apr 02 '25 06:04 jhed9

I upgraded to 970056be2de5eb2b3a20563c4b459b382da58516 from two hours ago, but this issue persists.

Apr 10 '25 17:04 jhed9

Update 970056b does not seem to address the issue at all, the description says: Update Go & tools I reviewed the changes and that seems to be all it is doing. It doesn't look like anyone is trying to fix this. If you are working to fix this issue please reply here with an update? Pretty please? This would be very nice to have fixed.

Apr 11 '25 02:04 abcbarryn

Just a ramdom user passing by, I tried to reproduce but --fallback or -f works perfectly.

How I tried to reproduce:

Since 9.9.9.9 block malware website and returns an NXDOMAIN response, while 9.9.9.10 does not filter anything, when I intentionally dig a malicious domain, if -f works, the first time I should get an empty reponse, while the second time I should get an address. (Since the first dig is connecting to 9.9.9.9, the second dig to 9.9.9.10)

Since quad9 uses cert.pl, I can just go to CERT malicious domains list, try find one that still can be resolved with 9.9.9.10, then test here: https://quad9.net/result/?url=, make sure it's blocked.

Then I run dnsproxy -l 127.0.0.1 -p 53 -u 9.9.9.9:53 -f 9.9.9.10:53, the malicious domain can be loaded perfectly without any problem, proving -f works.

May 10 '25 17:05 sao321

sao321 If you change 9.9.9.9:53 to an IP address where there is no DNS server, it never tries the fallback server. What you actually found is ANOTHER bug since if the primary server responds, even with an NX domain response, it really should not be using the fallback server. My issue is if the primary server is down, it never tries the fallback server.

May 10 '25 21:05 abcbarryn

sao321 If you change 9.9.9.9:53 to an IP address where there is no DNS server, it never tries the fallback server. What you actually found is ANOTHER bug since if the primary server responds, even with an NX domain response, it really should not be using the fallback server. My issue is if the primary server is down, it never tries the fallback server.

I don't know why you think that when the -u server returns an NXDOMAIN response, the -f option shouldn't retry. My understanding is that if the -u server doesn't return a valid response, the -f option should be tried. However, since the initial dig returns an invalid response, there's definitely room for improvement, though the browser would automatically retries and I'm still able to access domains blocked by the -u server.

This is important because, for example, if my ISP blocks a website by returning an NX response, you would hope that the fallback can catch this invalid reponse (unless the DNS behaves like the ones in China, where it simply poisons the response with a perfectly valid one).

I just tested with dnsproxy -l 127.0.0.1 -p 53 -u 10.10.10.10:53 -f 1.1.1.1:53 It works exactly the same, if you dig more than one time you will get a reponse after failure. However, passing down a timeout or invalid response is not ideal. But I think it would require dnsproxy to constantly query both -u and -f for -f to kick in immediately, which isn't ideal either.

Actually, after a few more tries, it appears -f sometimes can get a response after failure but sometimes can't, there's defintely a problem...

May 10 '25 23:05 sao321

sao321 An NX domain response is a valid response.

May 10 '25 23:05 abcbarryn

sao321 An NX domain response is a valid response.

No, it's not. You can't visit a site if there's nothing in the response. But you're entitled to your opinion. I'm not going to argue with you.

May 10 '25 23:05 sao321

Timeout or no connection is an invalid response. NX domain is a VALID response that says the domain does not exist. It might not be the response you WANT but it's still valid.

May 11 '25 00:05 abcbarryn

@sao321

NXDOMAIN is a valid response. In summary, it means that a DNS server claiming to be in charge of the domain (authoritative) responded, and the response was that the domain did not exist. This is in contrast to SERVFAIL where an authoritative DNS server could not be reached. RFC 6896 - Section 2.3

Then I run dnsproxy -l 127.0.0.1 -p 53 -u 9.9.9.9:53 -f 9.9.9.10:53, the malicious domain can be loaded perfectly without any problem, proving -f works.

I could not reproduce your results. I received a NXDOMAIN response each time with your dnsproxy parameters. I compiled and tested on f00be4dcb6106c8f753a8b45f054bb4ebc774958 (latest version of the master branch as of writing). I picked a domain from the CERT.pl link you provided.

This is important because, for example, if my ISP blocks a website by returning an NX response, you would hope that the fallback can catch this invalid reponse (unless the DNS behaves like the ones in China, where it simply poisons the response with a perfectly valid one).

In my opinion, a safer use case for dnsproxy would be to avoid using the ISP as an upstream server altogether. It's not just in China, other countries will give poisoned DNS responses too in order to "help" the customer get to where they think they want to go. Using an ISP alternative also prevents the ISP from selling the data in the DNS queries on the wire if you use DoT, DoH, or DoQ.

Actually, after a few more tries, it appears -f sometimes can get a response after failure but sometimes can't, there's defintely a problem...

Thank you for acknowledging the problem.

May 12 '25 21:05 jhed9

@jhed9 So, what about my issue where fallback fails to take effect on a SERVFAIL response??

May 13 '25 00:05 abcbarryn

@abcbarryn What about it? I'm having the same issue as you. That's why I'm here.

May 13 '25 23:05 jhed9

@abcbarryn I noticed today that the failover logic appears to be working better now. After 5 seconds it will fail over to the fallback server, query it, cache cache the response, set a minimum 60 seconds TTL and respond to the client. But it keeps trying the primary (upstream) server for each uncached lookup, the result is a 5 second delay on each new request. Can you confirm that is what you are seeing on your end now?

Jun 29 '25 03:06 jhed9