1

TLS Connection Failures - Stubby

I’m seeing connection failures between Stubby and NextDNS that I haven’t seen before, causing lookup timeouts and excessive connections to the service. Plain DNS works very well. Cloudflare and other DoT providers work well on Stubby, which leads me to think it’s a NextDNS issue. I cannot get the diagnostic tool to successfully look up nextdns.io while using Stubby but can run when not connected.

Looking for any insight or assistance. 

Version: Stubby 0.4.0 on FreshTomato

daemon.info stubby[20713]: 45.90.28.0 : Upstream : TLS - Resps= 26, Timeouts = 10, Best_auth =Success - with occasional SERVFAIL from dnsmasq

config

resolution_type: GETDNS_RESOLUTION_STUB
dns_transport_list:
- GETDNS_TRANSPORT_TLS
tls_authentication: GETDNS_AUTHENTICATION_REQUIRED
tls_query_padding_blocksize: 256
edns_client_subnet_private: 0
idle_timeout: 9000
tls_connection_retries: 5
tls_backoff_time: 900
timeout: 2000
round_robin_upstreams: 1
tls_min_version: GETDNS_TLS1_3
listen_addresses:
- 127.0.0.1@5453
- 0::1@5453
upstream_recursive_servers:
- address_data: 45.90.28.0
tls_auth_name: "xxxxxx.dns1.nextdns.io" etc

Will message diag privately on request. 

52 replies

null
    • Dan.3
    • 2 yrs ago
    • Reported - view

    For the sake of testing, I spun up Stubby on a Debian instance with the config above and can’t resolve lookups:

    $ nslookup eff.org 127.0.0.1
    Server:         127.0.0.1
    Address:        127.0.0.1#53

    ** server can't find eff.org: SERVFAIL

    With Cloudflare dropped into the config, I can resolve addresses. Any ideas?

      • NextDNs
      • 2 yrs ago
      • Reported - view

      Dan stubby is working for many people but it always had issues with certain versions and is generally less robust than many other clients. Why it does not work in your case is unclear. The timeout error does not make much sense and the logs does show much more to debug.

      • Dan.3
      • 2 yrs ago
      • Reported - view

      NextDNS thank you. I’m hoping to use the CLI soon. Cloudflare and other resolvers work well with Stubby - what do you think the difference is with NextDNS? DoT should work if it’s functioning with other providers?

      • Dan.3
      • 2 yrs ago
      • Reported - view

      NextDNS what additional steps can I take to debug this issue? Unfortunately Stubby’s debug logs can be limited. I’m hoping you might be able to test an instance on a system you control and observe NextDNS logs to see if the servers get hit, with what, and if they respond? That would be very helpful :)

      I’m not able to do additional troubleshooting right now, so I hope you can help!

      • NextDNs
      • 2 yrs ago
      • Reported - view

      Dan we tested stubby on many systems and it works. The only known issue with stubby is when it is linked with an old version of openssl, but the error would be different. Some people also reported stubby randomly falling back after and stop working, but again, errors would be different and the fix is easy.

      Please try with another DoT client or CLI to see if you are also getting timeout errors. That is the only next step we can advise.

    • firstlast
    • 2 yrs ago
    • Reported - view

    I use AsusWRT-Merlin with NextDNS and DoT. I believe it uses Stubby under the hood. For the past week or so, I've had terrible Internet on all my devices. I was able to pin it down to DNS today. Lots of slow DNS replies or total failures.

    Switching to Cloudflare fixes the issue.

    This may be anecdotal, but perhaps there is some wider issue here.

      • Dan.3
      • 2 yrs ago
      • Reported - view

      firstlast do you think you could turn on verbose logging for Stubby and post some snippets here?

      • teal_rabbit
      • 2 yrs ago
      • Reported - view

      firstlast I'm in a similar situation... thought it was my IPv6, but it continues to misbehave even when disabled... I've tried everything simple to fix it, because all I have is the DoT setup on my ASUS Merlin router and yeah... nothing fixes it, so I'm glad to hear other people were having issues... I was losing my mind thinking it was something in the configurations I'd messed up.

    • firstlast
    • 2 yrs ago
    • Reported - view

    Here is someone else with the same issue on AsusWRT-Merlin: https://www.snbforums.com/threads/dns-over-tls-and-chroot-nextdns-dot-issue.74466

    It's annoying because it was working for months and now all of a sudden it is an issue. :(

    • GoodVibes
    • 2 yrs ago
    • Reported - view

    Same problem with OpenWrt 19.07 running Stubby 0.3.0 and Debian Buster running Stubby 0.2.5. No problem if I change to Cloudflare or Quad9 DoT servers.

    • NextDNs
    • 2 yrs ago
    • Reported - view

    @firstlast @goodvibes please provide https://nextdns.io/diag

    • NextDNs
    • 2 yrs ago
    • Reported - view

    For everybody having an issue with stubby, please provide the version of stubby you are running and on what OS (the router firmware name and version if it is a router).

      • teal_rabbit
      • 2 yrs ago
      • Reported - view

      NextDNS Asuswrt-Merlin 386.3_2,  Stubby 0.4.0

      • Dan.3
      • 2 yrs ago
      • Reported - view

      NextDNS FreshTomato 2021.5, Stubby 0.4.0

      • GoodVibes
      • 2 yrs ago
      • Reported - view

      NextDNS OpenWrt 19.07  Stubby 0.3.0 and Debian Buster running Stubby 0.2.5

    • firstlast
    • 2 yrs ago
    • Reported - view

    I'm back to seeing similar behaviour now. Are other stubby users experiencing a regression?

     

    Thanks!

      • Dan.3
      • 2 yrs ago
      • Reported - view

      firstlast all still looks okay from my end. No timeouts or TLS issues. 

      • firstlast
      • 2 yrs ago
      • Reported - view

      Dan Thanks for checking!

      • GoodVibes
      • 2 yrs ago
      • Reported - view

      firstlast My Stubby (OpenWrt 19.07) was behaving erratically (lot of SERVFAIL errors) but was fixed with a service restart.

      • Gordon_Freeman
      • 2 yrs ago
      • Reported - view

      firstlast as a matter of fact, I still get a lot of those errors. It almost seems random when the problem occurs and when not 

      • firstlast
      • 2 yrs ago
      • Reported - view

      Gordon Freeman Yep, same. I stopped using NextDNS a few days ago as I don't have time to keep troubleshooting it.

      I'll give it another shot eventually and hope that whatever this issue is has been sorted out.

      • Gordon_Freeman
      • 2 yrs ago
      • Reported - view

      firstlast there is also the chance that stubby is at fault. On their GitHub page there is one issue opened, but it also only links to here

       

      https://github.com/getdnsapi/stubby/issues/297

      • Dan.3
      • 2 yrs ago
      • Reported - view

      Gordon Freeman very interesting. Okay, I have a task that restarts stubby every two hours on my router. I’ll stop this and see if the issue returns. The random connection failures were an issue prior to me opening this ticket, I just ran out of troubleshooting steam, and then it became unusable. It was periods of around five to ten minutes every couple of days where I could see the DNS requests hit the NextDNS logs, but dnsmasq would return SERVFAIL. Enabling round-robin in stubby also helped with this. 
       

      The issue described in the stubby issue is eerily similar. I’ll come back with results in the next day or two. 

      • Dan.3
      • 2 yrs ago
      • Reported - view

      Okay: 24 hours in and I’m not seeing any major issues:

      Sep 30 10:55:17 daemon.info stubby[26616]: 45.90.28.0 : Upstream : TLS - Resps= 4382, Timeouts = 1, Best_auth =Success
      Sep 30 10:55:17 mary73 daemon.info stubby[26616]: 45.90.28.0 : Upstream : TLS - Conns= 1754, Conn_fails= 0, Conn_shuts= 1, Backoffs = 9

      The back offs are from my flaky DSL resyncing, so only the single connection shut is interesting. As I mentioned, I run round-robin, so I’ve got log entries for each, but they’re all similar. 
      Will update tomorrow. Has anyone else had issues during the last 24 hours? Which version of stubby?

      • Dan.3
      • 2 yrs ago
      • Reported - view

      Update: I started seeing issues again. I had to restart stubby to stop the SERVFAILs. This is the same issue I had before and the workaround was setting a schedule to “service stubby restart” every two hours. 

      • firstlast
      • 2 yrs ago
      • Reported - view

      Dan I just tried again yesterday switching to NextDNS DoT servers and once again my home network came crawling to a halt. Same issues, cannot resolve queries.

      Sigh, I'm back to using Cloudflare DoT with the exact same config and have absolutely no issues. The problem is definitely unique to NextDNS somehow.

      Oh well.

Content aside

  • Status Fixed
  • 1 Likes
  • 11 mths agoLast active
  • 52Replies
  • 1549Views
  • 9 Following