
TLS Connection Failures - Stubby
-
- Dan
- Dan.3
- 11 mths ago
- 55 replies
- 861
- 1
- Gordon Freeman10 mths ago
- Bug Reports
- Fixed
I’m seeing connection failures between Stubby and NextDNS that I haven’t seen before, causing lookup timeouts and excessive connections to the service. Plain DNS works very well. Cloudflare and other DoT providers work well on Stubby, which leads me to think it’s a NextDNS issue. I cannot get the diagnostic tool to successfully look up nextdns.io while using Stubby but can run when not connected.
Looking for any insight or assistance.
Version: Stubby 0.4.0 on FreshTomato
daemon.info stubby[20713]: 45.90.28.0 : Upstream : TLS - Resps= 26, Timeouts = 10, Best_auth =Success - with occasional SERVFAIL from dnsmasq
config
resolution_type: GETDNS_RESOLUTION_STUB
dns_transport_list:
- GETDNS_TRANSPORT_TLS
tls_authentication: GETDNS_AUTHENTICATION_REQUIRED
tls_query_padding_blocksize: 256
edns_client_subnet_private: 0
idle_timeout: 9000
tls_connection_retries: 5
tls_backoff_time: 900
timeout: 2000
round_robin_upstreams: 1
tls_min_version: GETDNS_TLS1_3
listen_addresses:
- 127.0.0.1@5453
- 0::1@5453
upstream_recursive_servers:
- address_data: 45.90.28.0
tls_auth_name: "xxxxxx.dns1.nextdns.io" etc
Will message diag privately on request.
- Oldest first
- Newest first
- Active threads
- Popular
-
- NextDNSStaff
- NextDNs
- 11 mths ago
- 4
- Bug Reports
- Reported - view
We found why stubby is not happy. We will push a workaround in production ASAP.
Like 4-
- BS
- teal_rabbit
- 11 mths ago
- Reported - view
NextDNS THANK YOU for fixing this. Can confirm that NextDNS is behaving well when it previously did not.
Like -
- Dan
- Dan.3
- 11 mths ago
- Reported - view
NextDNS amazing! Thank you!
Like -
- firstlast
- firstlast
- 11 mths ago
- Reported - view
NextDNS THANK YOU SO MUCH!
Back to using NextDNS now, so glad to see the ads disappearing from my devices again.Like -
- Good Vibes
- GoodVibes
- 11 mths ago
- Reported - view
NextDNS Working! Thanks! What was the issue?
Like -
- Dan
- Dan.3
- 10 mths ago
- Reported - view
NextDNS sorry to bother you again but:
daemon.debug stubby[12925]: 45.90.28.0 : Conn closed: TLS - *Failure*
I’m seeing regression on the behaviour previously fixed.
Like
-
- Dan
- Dan.3
- 11 mths ago
- Bug Reports
- Reported - view
For the sake of testing, I spun up Stubby on a Debian instance with the config above and can’t resolve lookups:
$ nslookup eff.org 127.0.0.1
Server: 127.0.0.1
Address: 127.0.0.1#53** server can't find eff.org: SERVFAIL
With Cloudflare dropped into the config, I can resolve addresses. Any ideas?Like-
- NextDNSStaff
- NextDNs
- 11 mths ago
- Reported - view
Dan you made stubby listen on port 5453, to test it use dig -P 5453 test.com instead.
Like -
- Dan
- Dan.3
- 11 mths ago
- Reported - view
NextDNS
Sorry, I did see that and modified the config. I was watching the verbose log from Stubby. DNS requests would hit, TLS connection open, and then nothing, closing shortly after. Stubby indicated a request time out, per the previous example. Swap the servers to Cloudflare and all works. Do you see something similar on a Stubby instance?
Like -
- Dan
- Dan.3
- 11 mths ago
- Reported - view
NextDNS
Thanks in advance for your help! Stubby logs for example follow (sorry for the wall of text - how do you write code blocks here?)
dig test.com @127.0.0.1
; <<>> DiG 9.11.5-P4-5.1+deb10u5-Debian <<>> test.com @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 15093
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available;; QUESTION SECTION:
;test.com. IN A;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Aug 26 18:42:33 AWST 2021
;; MSG SIZE rcvd: 26
[10:38:00.995746] STUBBY: Read config from file stubby.yml
[10:38:00.996627] STUBBY: DNSSEC Validation is OFF
[10:38:00.996663] STUBBY: Transport list is:
[10:38:00.996678] STUBBY: - TLS
[10:38:00.996693] STUBBY: Privacy Usage Profile is Strict (Authentication required)
[10:38:00.996708] STUBBY: (NOTE a Strict Profile only applies when TLS is the ONLY transport!!)
[10:38:00.996722] STUBBY: Starting DAEMON....
[10:38:28.460227] STUBBY: 45.90.28.0 : Conn opened: TLS - Strict Profile
[10:38:28.576709] STUBBY: 45.90.28.0 : Verify passed : TLS
[10:38:33.458539] STUBBY: 2a07:a8c0:: : Conn opened: TLS - Strict Profile
[10:38:34.018680] STUBBY: 2a07:a8c0:: : Verify passed : TLS
[10:38:38.458883] STUBBY: 45.90.30.0 : Conn opened: TLS - Strict Profile
[10:38:38.460499] STUBBY: 45.90.28.0 : Conn closed: TLS - Resps= 0, Timeouts = 1, Curr_auth =Success, Keepalive(ms)= 0
[10:38:38.463042] STUBBY: 45.90.28.0 : Upstream : TLS - Resps= 0, Timeouts = 1, Best_auth =Success
[10:38:38.463065] STUBBY: 45.90.28.0 : Upstream : TLS - Conns= 1, Conn_fails= 0, Conn_shuts= 0, Backoffs = 0
[10:38:38.483120] STUBBY: 45.90.30.0 : Verify passed : TLS
[10:38:43.463608] STUBBY: 2a07:a8c0:: : Conn closed: TLS - Resps= 0, Timeouts = 1, Curr_auth =Success, Keepalive(ms)= 0
[10:38:43.463769] STUBBY: 2a07:a8c0:: : Upstream : TLS - Resps= 0, Timeouts = 1, Best_auth =Success
[10:38:43.463789] STUBBY: 2a07:a8c0:: : Upstream : TLS - Conns= 1, Conn_fails= 0, Conn_shuts= 0, Backoffs = 0
[10:38:48.464297] STUBBY: 45.90.30.0 : Conn closed: TLS - Resps= 0, Timeouts = 1, Curr_auth =Success, Keepalive(ms)= 0
[10:38:48.464377] STUBBY: 45.90.30.0 : Upstream : TLS - Resps= 0, Timeouts = 1, Best_auth =Success
[10:38:48.464395] STUBBY: 45.90.30.0 : Upstream : TLS - Conns= 1, Conn_fails= 0, Conn_shuts= 0, Backoffs = 0Like -
- NextDNSStaff
- NextDNs
- 11 mths ago
- Reported - view
Dan please send a diag
Like -
- NextDNSStaff
- NextDNs
- 11 mths ago
- Reported - view
Dan your logs shows ipv6 but your configuration has only one v4. Is the config shown above complete? If you have v6 IPs, please try again without them.
Like -
- Dan
- Dan.3
- 11 mths ago
- Reported - view
NextDNS I’ve sent a message to you with the diag
Like -
- Dan
- Dan.3
- 11 mths ago
- Reported - view
NextDNS thanks for checking. Yes, normal config is the complete output on the NextDNS setup page (4+6). I’ve also tested with just 45.90.28.0 with no configuration specific info.
Like -
- NextDNSStaff
- NextDNs
- 11 mths ago
- Reported - view
Dan please try the full config with ipv6 removed
Like -
- Dan
- Dan.3
- 11 mths ago
- Reported - view
NextDNS
dig example.com @127.0.0.1
; <<>> DiG 9.11.5-P4-5.1+deb10u5-Debian <<>> example.com @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 13951
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available;; QUESTION SECTION:
;example.com. IN A;; Query time: 2003 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Aug 26 22:34:19 AWST 2021
;; MSG SIZE rcvd: 29
Config
resolution_type: GETDNS_RESOLUTION_STUB
dns_transport_list:
- GETDNS_TRANSPORT_TLS
tls_authentication: GETDNS_AUTHENTICATION_REQUIRED
tls_query_padding_blocksize: 128
edns_client_subnet_private: 0
idle_timeout: 5000
tls_connection_retries: 5
tls_backoff_time: 900
timeout: 2000
round_robin_upstreams: 1
#tls_min_version: GETDNS_TLS1_3
listen_addresses:
- 127.0.0.1
- 0::1
upstream_recursive_servers:
- address_data: 45.90.28.0
tls_auth_name: "xxxxxx.dns1.nextdns.io"
- address_data: 45.90.30.0
tls_auth_name: "xxxxxx.dns2.nextdns.io"
Stubby log
[14:34:12.911360] STUBBY: Read config from file stubby_noipv6.yml
[14:34:12.912172] STUBBY: DNSSEC Validation is OFF
[14:34:12.912192] STUBBY: Transport list is:
[14:34:12.912200] STUBBY: - TLS
[14:34:12.912208] STUBBY: Privacy Usage Profile is Strict (Authentication required)
[14:34:12.912215] STUBBY: (NOTE a Strict Profile only applies when TLS is the ONLY transport!!)
[14:34:12.912223] STUBBY: Starting DAEMON....
[14:34:17.436308] STUBBY: 45.90.28.0 : Conn opened: TLS - Strict Profile
[14:34:17.551961] STUBBY: 45.90.28.0 : Verify passed : TLS
[14:34:19.437698] STUBBY: 45.90.28.0 : Conn closed: TLS - Resps= 0, Timeouts = 1, Curr_auth =Success, Keepalive(ms)= 0
[14:34:19.437771] STUBBY: 45.90.28.0 : Upstream : TLS - Resps= 0, Timeouts = 1, Best_auth =Success
[14:34:19.437787] STUBBY: 45.90.28.0 : Upstream : TLS - Conns= 1, Conn_fails= 0, Conn_shuts= 0, Backoffs = 0
In contrast, using 1.1.1.1:dig example.com @127.0.0.1
; <<>> DiG 9.11.5-P4-5.1+deb10u5-Debian <<>> example.com @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46405
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;example.com. IN A;; ANSWER SECTION:
example.com. 71169 IN A 93.184.216.34;; Query time: 34 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Aug 26 22:44:20 AWST 2021
;; MSG SIZE rcvd: 67Like -
- Dan
- Dan.3
- 11 mths ago
- Reported - view
NextDNS any thoughts?
Like -
- NextDNSStaff
- NextDNs
- 11 mths ago
- Reported - view
Dan can you please turn on debug logs?
Like -
- Dan
- Dan.3
- 11 mths ago
- Reported - view
NextDNS debug logs for Stubby? Those are in my previous message.
Like -
- NextDNSStaff
- NextDNs
- 11 mths ago
- Reported - view
Dan did you start stubby with the debug log level?
Like -
- Dan
- Dan.3
- 11 mths ago
- Reported - view
NextDNS yes.
stubby -v 7 -C stubby_noipv6.yml
The logs you see above immediately follow.
Like -
- Dan
- Dan.3
- 11 mths ago
- 1
- Reported - view
NextDNS have you had a chance to test this config on an instance of Stubby you control? Unfortunately I have no other test sites, other than another FreshTomato router, which exhibits the same symptoms (but is a different internet provider).
If I know it’s my end, I can start down another path - just let me know :)Could this have anything to do with the TLS cert changes in June? Thanks again.
Like 1 -
- NextDNSStaff
- NextDNs
- 11 mths ago
- 1
- Reported - view
Dan we indeed already tested stubby. Here it seems to be a timeout. Judging your diag, anycast routing for IPv6 isn’t right from where you are but v4 should be fine.
Would you be able to use our CLI instead of stubby?
Like 1 -
- BS
- teal_rabbit
- 11 mths ago
- Reported - view
NextDNS Not the OP but this doesn't really seem like a fair solution. If your product is meant to work outside of the app you've developed, then it should. Whatever recent changes were made to cause this issue are clearly affecting more than just one person. Otherwise NextDNS shouldn't advertise their DNS IPs for any solution other solution (DoT/DoH) if the only way you expect customers to use the product is via your CLI app.
Like -
- Dan
- Dan.3
- 11 mths ago
- Reported - view
NextDNS I would like to! But Tomato or Entware CLI isn’t ready yet :(
I could configure another host to run CLI for the network, but I would rather have it all on the router. I’ll continue running DNS over 53 for now.
So Stubby is working okay for you? What are your thoughts on the timeouts? If it was a routing issue, I would be having issues establishing a connection at all, right? DNS over 53 works really well.
Like -
- NextDNSStaff
- NextDNs
- 11 mths ago
- Reported - view
Dan stubby is working for many people but it always had issues with certain versions and is generally less robust than many other clients. Why it does not work in your case is unclear. The timeout error does not make much sense and the logs does show much more to debug.
Like -
- Dan
- Dan.3
- 11 mths ago
- 1
- Reported - view
NextDNS thank you. I’m hoping to use the CLI soon. Cloudflare and other resolvers work well with Stubby - what do you think the difference is with NextDNS? DoT should work if it’s functioning with other providers?
Like 1 -
- Dan
- Dan.3
- 11 mths ago
- Reported - view
NextDNS what additional steps can I take to debug this issue? Unfortunately Stubby’s debug logs can be limited. I’m hoping you might be able to test an instance on a system you control and observe NextDNS logs to see if the servers get hit, with what, and if they respond? That would be very helpful :)
I’m not able to do additional troubleshooting right now, so I hope you can help!
Like -
- NextDNSStaff
- NextDNs
- 11 mths ago
- 1
- Reported - view
Dan we tested stubby on many systems and it works. The only known issue with stubby is when it is linked with an old version of openssl, but the error would be different. Some people also reported stubby randomly falling back after and stop working, but again, errors would be different and the fix is easy.
Please try with another DoT client or CLI to see if you are also getting timeout errors. That is the only next step we can advise.
Like 1
-
- firstlast
- firstlast
- 11 mths ago
- 3
- Bug Reports
- Reported - view
I use AsusWRT-Merlin with NextDNS and DoT. I believe it uses Stubby under the hood. For the past week or so, I've had terrible Internet on all my devices. I was able to pin it down to DNS today. Lots of slow DNS replies or total failures.
Switching to Cloudflare fixes the issue.
This may be anecdotal, but perhaps there is some wider issue here.Like 3-
- Dan
- Dan.3
- 11 mths ago
- Reported - view
firstlast do you think you could turn on verbose logging for Stubby and post some snippets here?
Like -
- BS
- teal_rabbit
- 11 mths ago
- Reported - view
firstlast I'm in a similar situation... thought it was my IPv6, but it continues to misbehave even when disabled... I've tried everything simple to fix it, because all I have is the DoT setup on my ASUS Merlin router and yeah... nothing fixes it, so I'm glad to hear other people were having issues... I was losing my mind thinking it was something in the configurations I'd messed up.
Like
-
- firstlast
- firstlast
- 11 mths ago
- 1
- Bug Reports
- Reported - view
Here is someone else with the same issue on AsusWRT-Merlin: https://www.snbforums.com/threads/dns-over-tls-and-chroot-nextdns-dot-issue.74466
It's annoying because it was working for months and now all of a sudden it is an issue. :(Like 1 -
- Good Vibes
- GoodVibes
- 11 mths ago
- 1
- Bug Reports
- Reported - view
Same problem with OpenWrt 19.07 running Stubby 0.3.0 and Debian Buster running Stubby 0.2.5. No problem if I change to Cloudflare or Quad9 DoT servers.
Like 1 -
- NextDNSStaff
- NextDNs
- 11 mths ago
- Bug Reports
- Reported - view
@firstlast @goodvibes please provide https://nextdns.io/diag
Like -
- freeson
- freeson
- 11 mths ago
- Bug Reports
- Reported - view
@NextDNS
I have just begun (in the last 3 or 4 days) experiencing the same thing with Stubby after it was running fine for months and no changes to my config. [I am surprised by seeing IPv6 addresses, traceroutes and pings seemingly working. I have never had IPv6 before and not sure what to make of it -- ISP has not announced it. Not sure when that started.]
I have basically the same config as dan.
I have sent a diag.
[EDIT] Oops. diag didn't go.
Post unsuccessful: Post "https://api.nextdns.io/diagnostic": dial tcp: lookup api.nextdns.io on 127.0.0.1:53: server misbehaving
Please report this issue on https://github.com/nextdns/diag
Like-
- NextDNSStaff
- NextDNs
- 11 mths ago
- Reported - view
freeson please run the diag with stubby disabled. Some stubby logs in debug level would also be helpful.
Like -
- freeson
- freeson
- 11 mths ago
- Reported - view
NextDNS
Here is a stubby log:
dig lk-case.com [20:07:26.040584] STUBBY: 45.90.30.0 : Conn opened: TLS - Strict Profile [20:07:26.078276] STUBBY: 45.90.30.0 : Verify passed : TLS [20:07:32.050409] STUBBY: 45.90.28.0 : Conn opened: TLS - Strict Profile [20:07:32.077409] STUBBY: 45.90.28.0 : Verify passed : TLS ; <<>> DiG 9.16.18 <<>> lk-case.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30527 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ;; QUESTION SECTION: ;lk-case.com. IN A ;; ANSWER SECTION: lk-case.com. 600 IN A 23.227.38.32 ;; Query time: 249 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Fri Sep 03 16:07:38 EDT 2021 ;; MSG SIZE rcvd: 67 /etc/stubby# [20:07:41.043523] STUBBY: 45.90.30.0 : Conn closed: TLS - Resps= 2, Timeouts = 1, Curr_auth =Success, Keepalive(ms)= 0 [20:07:41.043600] STUBBY: 45.90.30.0 : Upstream : TLS - Resps= 2, Timeouts = 2, Best_auth =Success [20:07:41.043646] STUBBY: 45.90.30.0 : Upstream : TLS - Conns= 2, Conn_fails= 0, Conn_shuts= 0, Backoffs = 0 [20:07:47.050639] STUBBY: 45.90.28.0 : Conn closed: TLS - Resps= 1, Timeouts = 1, Curr_auth =Success, Keepalive(ms)= 0 [20:07:47.050713] STUBBY: 45.90.28.0 : Upstream : TLS - Resps= 2, Timeouts = 2, Best_auth =Success [20:07:47.050762] STUBBY: 45.90.28.0 : Upstream : TLS - Conns= 2, Conn_fails= 0, Conn_shuts= 0, Backoffs = 0
With round robin on, it takes 12 seconds to respond. With it off, 6 seconds. If the timeout is less than 6 seconds it fails (SERVFAIL) very consistently. With the timeout greater than 6 seconds it usually succeeds (NOERROR) with the response coming in at 6 seconds.Like
-
- NextDNSStaff
- NextDNs
- 11 mths ago
- 2
- Bug Reports
- Reported - view
For everybody having an issue with stubby, please provide the version of stubby you are running and on what OS (the router firmware name and version if it is a router).
Like 2-
- freeson
- freeson
- 11 mths ago
- Reported - view
NextDNS OpenWRT 19.07.8, Stubby 0.3.0-1
Like -
- BS
- teal_rabbit
- 11 mths ago
- Reported - view
NextDNS Asuswrt-Merlin 386.3_2, Stubby 0.4.0
Like -
- Dan
- Dan.3
- 11 mths ago
- Reported - view
NextDNS FreshTomato 2021.5, Stubby 0.4.0
Like -
- Good Vibes
- GoodVibes
- 11 mths ago
- Reported - view
NextDNS OpenWrt 19.07 Stubby 0.3.0 and Debian Buster running Stubby 0.2.5
Like
-
- firstlast
- firstlast
- 11 mths ago
- Bug Reports
- Reported - view
I'm back to seeing similar behaviour now. Are other stubby users experiencing a regression?
Thanks!
Like-
- Dan
- Dan.3
- 11 mths ago
- Reported - view
firstlast all still looks okay from my end. No timeouts or TLS issues.
Like -
- firstlast
- firstlast
- 11 mths ago
- Reported - view
Dan Thanks for checking!
Like -
- Good Vibes
- GoodVibes
- 11 mths ago
- Reported - view
firstlast My Stubby (OpenWrt 19.07) was behaving erratically (lot of SERVFAIL errors) but was fixed with a service restart.
Like -
- Gordon Freeman
- Gordon_Freeman
- 10 mths ago
- Reported - view
firstlast as a matter of fact, I still get a lot of those errors. It almost seems random when the problem occurs and when not
Like -
- firstlast
- firstlast
- 10 mths ago
- Reported - view
Gordon Freeman Yep, same. I stopped using NextDNS a few days ago as I don't have time to keep troubleshooting it.
I'll give it another shot eventually and hope that whatever this issue is has been sorted out.Like -
- Gordon Freeman
- Gordon_Freeman
- 10 mths ago
- Reported - view
firstlast there is also the chance that stubby is at fault. On their GitHub page there is one issue opened, but it also only links to here
Like -
- Dan
- Dan.3
- 10 mths ago
- Reported - view
Gordon Freeman very interesting. Okay, I have a task that restarts stubby every two hours on my router. I’ll stop this and see if the issue returns. The random connection failures were an issue prior to me opening this ticket, I just ran out of troubleshooting steam, and then it became unusable. It was periods of around five to ten minutes every couple of days where I could see the DNS requests hit the NextDNS logs, but dnsmasq would return SERVFAIL. Enabling round-robin in stubby also helped with this.
The issue described in the stubby issue is eerily similar. I’ll come back with results in the next day or two.
Like -
- Dan
- Dan.3
- 10 mths ago
- Reported - view
Okay: 24 hours in and I’m not seeing any major issues:
Sep 30 10:55:17 daemon.info stubby[26616]: 45.90.28.0 : Upstream : TLS - Resps= 4382, Timeouts = 1, Best_auth =Success
Sep 30 10:55:17 mary73 daemon.info stubby[26616]: 45.90.28.0 : Upstream : TLS - Conns= 1754, Conn_fails= 0, Conn_shuts= 1, Backoffs = 9The back offs are from my flaky DSL resyncing, so only the single connection shut is interesting. As I mentioned, I run round-robin, so I’ve got log entries for each, but they’re all similar.
Will update tomorrow. Has anyone else had issues during the last 24 hours? Which version of stubby?Like -
- Dan
- Dan.3
- 10 mths ago
- Reported - view
Update: I started seeing issues again. I had to restart stubby to stop the SERVFAILs. This is the same issue I had before and the workaround was setting a schedule to “service stubby restart” every two hours.
Like -
- firstlast
- firstlast
- 10 mths ago
- Reported - view
Dan I just tried again yesterday switching to NextDNS DoT servers and once again my home network came crawling to a halt. Same issues, cannot resolve queries.
Sigh, I'm back to using Cloudflare DoT with the exact same config and have absolutely no issues. The problem is definitely unique to NextDNS somehow.Oh well.
Like -
- Gordon Freeman
- Gordon_Freeman
- 10 mths ago
- Reported - view
firstlast seems to be running pretty well the last few days, I don't trust it
Like