1

TLS Connection Failures - Stubby

I’m seeing connection failures between Stubby and NextDNS that I haven’t seen before, causing lookup timeouts and excessive connections to the service. Plain DNS works very well. Cloudflare and other DoT providers work well on Stubby, which leads me to think it’s a NextDNS issue. I cannot get the diagnostic tool to successfully look up nextdns.io while using Stubby but can run when not connected.

Looking for any insight or assistance. 

Version: Stubby 0.4.0 on FreshTomato

daemon.info stubby[20713]: 45.90.28.0 : Upstream : TLS - Resps= 26, Timeouts = 10, Best_auth =Success - with occasional SERVFAIL from dnsmasq

config

resolution_type: GETDNS_RESOLUTION_STUB
dns_transport_list:
- GETDNS_TRANSPORT_TLS
tls_authentication: GETDNS_AUTHENTICATION_REQUIRED
tls_query_padding_blocksize: 256
edns_client_subnet_private: 0
idle_timeout: 9000
tls_connection_retries: 5
tls_backoff_time: 900
timeout: 2000
round_robin_upstreams: 1
tls_min_version: GETDNS_TLS1_3
listen_addresses:
- 127.0.0.1@5453
- 0::1@5453
upstream_recursive_servers:
- address_data: 45.90.28.0
tls_auth_name: "xxxxxx.dns1.nextdns.io" etc

Will message diag privately on request. 

52 replies

null
    • NextDNs
    • 2 yrs ago
    • Reported - view

    We found why stubby is not happy. We will push a workaround in production ASAP.

      • teal_rabbit
      • 2 yrs ago
      • Reported - view

      NextDNS THANK YOU for fixing this. Can confirm that NextDNS is behaving well when it previously did not.

      • Dan.3
      • 2 yrs ago
      • Reported - view

      NextDNS amazing! Thank you!

      • firstlast
      • 2 yrs ago
      • Reported - view

      NextDNS THANK YOU SO MUCH!

      Back to using NextDNS now, so glad to see the ads disappearing from my devices again.

      • GoodVibes
      • 2 yrs ago
      • Reported - view

      NextDNS Working! Thanks! What was the issue?

      • Dan.3
      • 2 yrs ago
      • Reported - view

      NextDNS sorry to bother you again but: 

      daemon.debug stubby[12925]: 45.90.28.0                               : Conn closed: TLS - *Failure*

      I’m seeing regression on the behaviour previously fixed. 

    • Dan.3
    • 2 yrs ago
    • Reported - view

    For the sake of testing, I spun up Stubby on a Debian instance with the config above and can’t resolve lookups:

    $ nslookup eff.org 127.0.0.1
    Server:         127.0.0.1
    Address:        127.0.0.1#53

    ** server can't find eff.org: SERVFAIL

    With Cloudflare dropped into the config, I can resolve addresses. Any ideas?

      • NextDNs
      • 2 yrs ago
      • Reported - view

      Dan you made stubby listen on port 5453, to test it use dig -P 5453 test.com instead.

      • Dan.3
      • 2 yrs ago
      • Reported - view

      NextDNS 

      Sorry, I did see that and modified the config. I was watching the verbose log from Stubby. DNS requests would hit, TLS connection open, and then nothing, closing shortly after. Stubby indicated a request time out, per the previous example. Swap the servers to Cloudflare and all works. Do you see something similar on a Stubby instance? 

      • Dan.3
      • 2 yrs ago
      • Reported - view

      NextDNS 

      Thanks in advance for your help! Stubby logs for example follow (sorry for the wall of text - how do you write code blocks here?)

      dig test.com @127.0.0.1

      ; <<>> DiG 9.11.5-P4-5.1+deb10u5-Debian <<>> test.com @127.0.0.1
      ;; global options: +cmd
      ;; Got answer:
      ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 15093
      ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
      ;; WARNING: recursion requested but not available

      ;; QUESTION SECTION:
      ;test.com.                      IN      A

      ;; Query time: 0 msec
      ;; SERVER: 127.0.0.1#53(127.0.0.1)
      ;; WHEN: Thu Aug 26 18:42:33 AWST 2021
      ;; MSG SIZE  rcvd: 26


      [10:38:00.995746] STUBBY: Read config from file stubby.yml
      [10:38:00.996627] STUBBY: DNSSEC Validation is OFF
      [10:38:00.996663] STUBBY: Transport list is:
      [10:38:00.996678] STUBBY:   - TLS
      [10:38:00.996693] STUBBY: Privacy Usage Profile is Strict (Authentication required)
      [10:38:00.996708] STUBBY: (NOTE a Strict Profile only applies when TLS is the ONLY transport!!)
      [10:38:00.996722] STUBBY: Starting DAEMON....
      [10:38:28.460227] STUBBY: 45.90.28.0                               : Conn opened: TLS - Strict Profile
      [10:38:28.576709] STUBBY: 45.90.28.0                               : Verify passed : TLS
      [10:38:33.458539] STUBBY: 2a07:a8c0::                              : Conn opened: TLS - Strict Profile
      [10:38:34.018680] STUBBY: 2a07:a8c0::                              : Verify passed : TLS
      [10:38:38.458883] STUBBY: 45.90.30.0                               : Conn opened: TLS - Strict Profile
      [10:38:38.460499] STUBBY: 45.90.28.0                               : Conn closed: TLS - Resps=     0, Timeouts  =     1, Curr_auth =Success, Keepalive(ms)=     0
      [10:38:38.463042] STUBBY: 45.90.28.0                               : Upstream   : TLS - Resps=     0, Timeouts  =     1, Best_auth =Success
      [10:38:38.463065] STUBBY: 45.90.28.0                               : Upstream   : TLS - Conns=     1, Conn_fails=     0, Conn_shuts=      0, Backoffs     =     0
      [10:38:38.483120] STUBBY: 45.90.30.0                               : Verify passed : TLS
      [10:38:43.463608] STUBBY: 2a07:a8c0::                              : Conn closed: TLS - Resps=     0, Timeouts  =     1, Curr_auth =Success, Keepalive(ms)=     0
      [10:38:43.463769] STUBBY: 2a07:a8c0::                              : Upstream   : TLS - Resps=     0, Timeouts  =     1, Best_auth =Success
      [10:38:43.463789] STUBBY: 2a07:a8c0::                              : Upstream   : TLS - Conns=     1, Conn_fails=     0, Conn_shuts=      0, Backoffs     =     0
      [10:38:48.464297] STUBBY: 45.90.30.0                               : Conn closed: TLS - Resps=     0, Timeouts  =     1, Curr_auth =Success, Keepalive(ms)=     0
      [10:38:48.464377] STUBBY: 45.90.30.0                               : Upstream   : TLS - Resps=     0, Timeouts  =     1, Best_auth =Success
      [10:38:48.464395] STUBBY: 45.90.30.0                               : Upstream   : TLS - Conns=     1, Conn_fails=     0, Conn_shuts=      0, Backoffs     =     0

      • NextDNs
      • 2 yrs ago
      • Reported - view

      Dan please send a diag

      • NextDNs
      • 2 yrs ago
      • Reported - view

      Dan your logs shows ipv6 but your configuration has only one v4. Is the config shown above complete? If you have v6 IPs, please try again without them.

      • Dan.3
      • 2 yrs ago
      • Reported - view

      NextDNS I’ve sent a message to you with the diag

      • Dan.3
      • 2 yrs ago
      • Reported - view

      NextDNS thanks for checking. Yes, normal config is the complete output on the NextDNS setup page (4+6). I’ve also tested with just 45.90.28.0 with no configuration specific info. 

      • NextDNs
      • 2 yrs ago
      • Reported - view

      Dan please try the full config with ipv6 removed

      • Dan.3
      • 2 yrs ago
      • Reported - view

      NextDNS 

      dig example.com @127.0.0.1

      ; <<>> DiG 9.11.5-P4-5.1+deb10u5-Debian <<>> example.com @127.0.0.1
      ;; global options: +cmd
      ;; Got answer:
      ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 13951
      ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
      ;; WARNING: recursion requested but not available

      ;; QUESTION SECTION:
      ;example.com.                   IN      A

      ;; Query time: 2003 msec
      ;; SERVER: 127.0.0.1#53(127.0.0.1)
      ;; WHEN: Thu Aug 26 22:34:19 AWST 2021
      ;; MSG SIZE  rcvd: 29
       

      Config

      resolution_type: GETDNS_RESOLUTION_STUB
      dns_transport_list:
        - GETDNS_TRANSPORT_TLS
      tls_authentication: GETDNS_AUTHENTICATION_REQUIRED
      tls_query_padding_blocksize: 128
      edns_client_subnet_private: 0
      idle_timeout: 5000
      tls_connection_retries: 5
      tls_backoff_time: 900
      timeout: 2000
      round_robin_upstreams: 1
      #tls_min_version: GETDNS_TLS1_3
      listen_addresses:
        - 127.0.0.1
        - 0::1
      upstream_recursive_servers:
        - address_data: 45.90.28.0
          tls_auth_name: "xxxxxx.dns1.nextdns.io"
        - address_data: 45.90.30.0
          tls_auth_name: "xxxxxx.dns2.nextdns.io"
       

      Stubby log

      [14:34:12.911360] STUBBY: Read config from file stubby_noipv6.yml
      [14:34:12.912172] STUBBY: DNSSEC Validation is OFF
      [14:34:12.912192] STUBBY: Transport list is:
      [14:34:12.912200] STUBBY:   - TLS
      [14:34:12.912208] STUBBY: Privacy Usage Profile is Strict (Authentication required)
      [14:34:12.912215] STUBBY: (NOTE a Strict Profile only applies when TLS is the ONLY transport!!)
      [14:34:12.912223] STUBBY: Starting DAEMON....
      [14:34:17.436308] STUBBY: 45.90.28.0                               : Conn opened: TLS - Strict Profile
      [14:34:17.551961] STUBBY: 45.90.28.0                               : Verify passed : TLS
      [14:34:19.437698] STUBBY: 45.90.28.0                               : Conn closed: TLS - Resps=     0, Timeouts  =     1, Curr_auth =Success, Keepalive(ms)=     0
      [14:34:19.437771] STUBBY: 45.90.28.0                               : Upstream   : TLS - Resps=     0, Timeouts  =     1, Best_auth =Success
      [14:34:19.437787] STUBBY: 45.90.28.0                               : Upstream   : TLS - Conns=     1, Conn_fails=     0, Conn_shuts=      0, Backoffs     =     0


      In contrast, using 1.1.1.1:

      dig example.com @127.0.0.1

      ; <<>> DiG 9.11.5-P4-5.1+deb10u5-Debian <<>> example.com @127.0.0.1
      ;; global options: +cmd
      ;; Got answer:
      ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46405
      ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

      ;; OPT PSEUDOSECTION:
      ; EDNS: version: 0, flags:; udp: 1232
      ;; QUESTION SECTION:
      ;example.com.                   IN      A

      ;; ANSWER SECTION:
      example.com. 71169 IN A 93.184.216.34

      ;; Query time: 34 msec
      ;; SERVER: 127.0.0.1#53(127.0.0.1)
      ;; WHEN: Thu Aug 26 22:44:20 AWST 2021
      ;; MSG SIZE  rcvd: 67

      • Dan.3
      • 2 yrs ago
      • Reported - view

      NextDNS any thoughts?

      • NextDNs
      • 2 yrs ago
      • Reported - view

      Dan can you please turn on debug logs?

      • Dan.3
      • 2 yrs ago
      • Reported - view

      NextDNS debug logs for Stubby? Those are in my previous message. 

      • NextDNs
      • 2 yrs ago
      • Reported - view

      Dan did you start stubby with the debug log level?

      • Dan.3
      • 2 yrs ago
      • Reported - view

      NextDNS yes. 

      stubby -v 7 -C stubby_noipv6.yml

      The logs you see above immediately follow. 

      • Dan.3
      • 2 yrs ago
      • Reported - view

      NextDNS have you had a chance to test this config on an instance of Stubby you control? Unfortunately I have no other test sites, other than another FreshTomato router, which exhibits the same symptoms (but is a different internet provider). 

      If I know it’s my end, I can start down another path - just let me know :)

      Could this have anything to do with the TLS cert changes in June? Thanks again. 

      • NextDNs
      • 2 yrs ago
      • Reported - view

      Dan we indeed already tested stubby. Here it seems to be a timeout. Judging your diag, anycast routing for IPv6 isn’t right from where you are but v4 should be fine.

      Would you be able to use our CLI instead of stubby?

      • teal_rabbit
      • 2 yrs ago
      • Reported - view

      NextDNS Not the OP but this doesn't really seem like a fair solution. If your product is meant to work outside of the app you've developed, then it should. Whatever recent changes were made to cause this issue are clearly affecting more than just one person. Otherwise NextDNS shouldn't advertise their DNS IPs for any solution other solution (DoT/DoH) if the only way you expect customers to use the product is via your CLI app. 🤔

      • Dan.3
      • 2 yrs ago
      • Reported - view

      NextDNS I would like to! But Tomato or Entware CLI isn’t ready yet :(

      I could configure another host to run CLI for the network, but I would rather have it all on the router. I’ll continue running DNS over 53 for now. 

      So Stubby is working okay for you? What are your thoughts on the timeouts? If it was a routing issue, I would be having issues establishing a connection at all, right? DNS over 53 works really well.

Content aside

  • Status Fixed
  • 1 Likes
  • 11 mths agoLast active
  • 52Replies
  • 1549Views
  • 9 Following