0

Horrible DNS latencies since yesterday - family is not happy.

Hi Team:  Long time NextDNS user with ASUS Merlin router.   NO changes on router and I do a manual setup using stubby.yml which has not changed.

Starting yesterday, DNS latencies went horrible and barely resolve.  Normally, boom and all in the low 20ms.  Family is screaming about how horrible DNS is.  What's going on? 

Oh I already tried to DL the "diag" script and 2+ antiviral / malware programs wiped it out immediately without even opening it.   So I doubt that's going to get passed those scanners. 

Also, I've already rebooted the router and checked my stubby.yml file for any changes.  

ping.nextdns.io yields (multiple times) I might get 1 to resolve to 25-50 ms if lucky.

 hydron-clt                error

  tier-clt                  error

  anexia-mnz                error

  zepto-xrs                 error

  zepto-iad                 error

  wlvrz-was                 error

  teraswitch-pit            error

  router-pit                error

  anexia-atl                error

  vultr-atl                 error

anycast.dns1.nextdns.io error (anycast1)

anycast.dns2.nextdns.io error (anycast2)

dns1.nextdns.io error (ultralow1)

dns2.nextdns.io error (ultralow2)

58replies Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
  • Can you run the diag from a non windows machine?

    You may try ping with nextdns disabled to understand better what is going on.

    Also please try a traceroute to 45.90.28.0 and 45.90.30.0.

    Like
  • Things have substantially improved this AM without explanation or root-cause.  My stubby.yml and router settings were verified right and unchanged for 6+ months.   Family has stopped complaining for now.  Whatever you guys did, thanks! 

    I'm sorry - I have no linux systems at home. Maybe next time my wife will let me near her Mac but it too is running anti-viral/anti-malware so I'll not be surprised if it's flagged there too.  If there was a way to run it on the router (ASUS / Merlin) then that I can do. 

    vultr-atl        24 ms  (anycast1, ultralow1)

      zepto-iad        26 ms

      tier-clt         27 ms

      zepto-xrs        29 ms

      hydron-clt       30 ms

      teraswitch-pit   31 ms

      anexia-atl       32 ms

      anexia-mnz       33 ms

      router-pit       34 ms

      anexia-rio      149 ms  (anycast2, ultralow2)

      wlvrz-was        error

    Like
  • We did nothing but you're welcome :) It has probably something to do with your ISP or something.

    Like 1
  • We've been having issues all morning. The only way I was able to resolve it was to remove NextDNS and move back to Quad9. In my case it's high packet loss to 45.90.28.40 which is our primary DNS IP per our settings page.

    Like
      • Chris Dunn
      • Chris_Dunn
      • 2 yrs ago
      • 1
      • Reported - view

      Chris Dunn Primary IP for us, in this case, is 45.90.28.40. When 45.90.30.40 was experiencing the high packet loss, our secondary IP, 45.90.30.40 wasn't impacted.

      Like 1
    • Chris Dunn is it still happening? What is your ISP?

      Like
    • Olivier Poitrey Haven't tried again as the kids needed access to their school work and I needed access to work. I'm connected to a local WISP but I didn't see any interruption to the 45.90.30.40 IP. If it helps I'm using YogaDNS on most of the PCs. One of the kids PC doesn't have that loaded and he never had an issue. All the PC's running YogaDNS with the NextDNS settings loaded were impacted. 

      Like
    • Chris Dunn In yoga, did you try with and without the "Ultra Low Latency" option? Does it change something?

      Like
    • Olivier Poitrey That option is not enabled on any of the PCs here. Should it be enabled? I've re-enabled it on my PC and so far it's been stable and not seeing the packet loss to that default DNS IP.

      Like
    • Chris Dunn yes, it’s still very new but should give best results.

      Like
  • Hi Olivier and team: 

    I had remove NextDNS yesterday afternoon from my ASUS router and replace with Quad9/Cloudflare b/c NextDNS died again with no DNS resolution. I left NextDNS off all night with no issues and am currently still using the others since I cannot disrupt the WAH.   

    This AM, here's the tracert -  still not good with those timeouts - usually saw many more yesterday.  

    Thanks for any recommendations.. I suspect this is all ISP routing issues but if there's something you guys need to kick, by all means please. 

    >tracert 45.90.28.114

    Tracing route to dns1.nextdns.io [45.90.28.114]
    over a maximum of 30 hops:

    1 2 ms 1 ms 1 ms 3622-10007-AC1900-FA38.xxx [192.168.100.7]
      2     2 ms     2 ms     3 ms  192.168.222.7
      3     2 ms     1 ms     1 ms  192.168.111.7
    4 15 ms 13 ms 23 ms 065-190-080-001.inf.spectrum.com [65.190.80.1]
      5    14 ms    33 ms    19 ms  174.111.102.224
    6 16 ms 11 ms 20 ms cpe-024-025-062-048.ec.res.rr.com [24.25.62.48]
    7 20 ms 15 ms 14 ms be31.drhmncev01r.southeast.rr.com [24.93.64.184]
      8    23 ms    15 ms    29 ms  66.109.10.176
    9 28 ms 22 ms 31 ms bu-ether12.vinnva0510w-bcr00.tbone.rr.com [66.109.6.31]
    10 23 ms 21 ms 32 ms ae-11.edge5.WashintonDC12.Level3.net [4.68.37.213]
     11     *        *        *     Request timed out.
    12 30 ms 35 ms 40 ms CHOOPA-LLC.ear3.NewYork1.Level3.net [4.15.213.214]
     13     *        *        *     Request timed out.
     14     *        *        *     Request timed out.
     15     *        *        *     Request timed out.
    16 26 ms 26 ms 38 ms dns1.nextdns.io [45.90.28.114]

    Trace complete.

    >tracert 45.90.30.114

    Tracing route to dns2.nextdns.io [45.90.30.114]
    over a maximum of 30 hops:

    1 1 ms 1 ms 2 ms 3622-10007-AC1900-FA38.xxx [192.168.100.7]
      2     1 ms     2 ms     1 ms  192.168.222.7
      3     2 ms     1 ms     1 ms  192.168.111.7
    4 17 ms 15 ms 32 ms 065-190-080-001.inf.spectrum.com [65.190.80.1]
      5    13 ms    14 ms    14 ms  174.111.102.224
    6 27 ms 14 ms 14 ms cpe-024-025-062-048.ec.res.rr.com [24.25.62.48]
    7 22 ms 25 ms 12 ms be31.drhmncev01r.southeast.rr.com [24.93.64.184]
      8    26 ms    31 ms    23 ms  66.109.10.176
    9 21 ms 31 ms 29 ms 209-18-43-59.dfw10.tbone.rr.com [209.18.43.59]
    10 23 ms 23 ms 21 ms ash-b2-link.ip.twelve99.net [62.115.188.210]
    11 19 ms 24 ms 17 ms voxility-svc071266-ic357612.ip.twelve99-cust.net [195.12.254.137]
    12 22 ms 22 ms 35 ms ash-eqx-01c.voxility.net [5.254.81.22]
     13     *        *        *     Request timed out.
    14 26 ms 25 ms 20 ms c0010.mc2.iad01.us.misaka.io [45.11.106.10]
    15 27 ms 28 ms 21 ms dns2.nextdns.io [45.90.30.114]

    Trace complete.

    The ping.nextdns.io shows just now (not using NextDNS for DNS)

    zepto-iad        22 ms  (anycast2)

      vultr-atl        23 ms  (ultralow2)

      anexia-atl       23 ms  (ultralow1)

      zepto-xrs        27 ms

      vultr-ewr        28 ms  (anycast1)

      tier-clt         34 ms

      anexia-mnz       36 ms

      teraswitch-pit   36 ms

      router-pit       37 ms

      hydron-clt       39 ms

      smarthost-jax    45 ms

    Thanks!  Stay safe, stay alive! 

     

    It would really be super if we could setup the configs so that if NextDNS was not responding, our setups would automatically fallover to one of the other DNS providers like: QUAD9/CloudFlare/Google, ... and then send us an alert maybe.  It would have to be an "opt-in" setting b/c I'm sure there are people who do not want to use any of those options under any circumstances.    We can configure the listing in the router but then that doesn't play nice with the nextDNS setups as far as I know/read a year or so back. 

    THANKS!  

    Like
    • G Mobley your traceroute and ping look good. Are you sure the issue isn’t elsewhere? How did you setup nextdns?

      Like
  • I am seeing similar behavior this morning as well too. Olivier Poitrey  here is a screenshot from just now. I have since moved back to cloudflare for the moment as it was unbearable.

    Like 1
  • Hi Olivier:  My setup has been stable until a few weeks ago.  I have a manual integration (no agent) on my ASUS router running Merlin 384.19.  I've been working with NextDNS and a paying customer since you launched by helping many Merlin users in the SNB forums.

    Nothing changed in my config. NextDNS totally stopped working at 16:18 on 2/25 - dead stop from DNS logs.   I rebooted my Main ASUS router - not DNS resolved.  My wife was standing in my office door so I quickly replaced the NextDNS config with QUAD9/Cloudflare and reset the 1 stubby file - everyone was back happy. 

    This is twice in ~ 2 weeks (see earlier report) NextDNS cold-stopped working using a config that's been stable for 6-12 months - untouched.  

    I've not tried switching back to NextDNS this AM as the wife is already up. Not much tinkering I can do until "off-peak" hours now.      I cannot provide that diag b/c the anti-viral and anti-malware software will just remove it from the setups.  I did provide the ping and other PD tools. THANKS!  

    Like
  • Also similar issues taking ages to resolve sites yesterday, 10+ seconds.

    Like 1
  • I keep having off-and-on issues with resolution. It's almost to the point I need to find another service.  In my YogaDNS I'm seeing blocks of time where I get "Error in getaddrinfo: No such host is known." or "request timeout"

    Like
    • G Mobley said:
      that's

       Chris Dunn  

      Chris I had a similar problem. I fixed by creating a new dns rule and adding these domain's to the rule. 

      dns.nextdns.io

      steering.nextdns.io

      anycast.dns.nextdns.io

      dns1.nextdns.io

      dns2.nextdns.io

      Also add a new DOH dns resolver.

      https://1dot1dot1dot1.cloudflare-dns.com/dns-query

      The dns server address is 1.1.1.1

       

      This configuration should fix your time outs.

       

      Also for your nextdns cli you must add a dns forwarder

       

      sudo nextdns config set     -forwarder mycompany2.com=https://doh.mycompany.com/dns-query#1.2.3.4
      sudo nextdns restart
      Like 2
      • Chris Dunn
      • Chris_Dunn
      • 2 yrs ago
      • 1
      • Reported - view

      John DeCarlo I'm giving this a try, thank you. 

      Like 1
    • Chris Dunn 

      Let be know how it works out for you.

      Like
  • John DeCarlo  I'm trying to figure out WHERE you created "a new DNS rule..." Is that on the router, in NextDNS, both, neither?    I'm going to try going back to NextDNS.... Thanks.  

    Like 1
    • G Mobley  What kind of problems are you having.   

      Like
  • Still been having erratic behavior.  Dropped back to QUAD9/Cloudflare for about a week and the erratic and slow DNS seemed to behave... Switched back to NextDNS on SAT and things seemed to get noticeably slower.   I'm still digging.  I do not use the client as I manually configure stubby.yml for the few changes NextDNS wants.   Thanks.

    Like 1
    • G Mobley Try setting the DOH on your browser. Then also setup default dns 45.90.28.xxx and 45.90.30xxx on your router. Then test your browser out and let me know if that helps?

      Like
  • Thanks.  I've got DOH enabled on the ASUS and all DNS is forced thru the router's NextDNS setup.  I've also reverified all the "checkboxes" selected correctly for the NextDNS setup.  Been running NextDNS for more than a year without issues until my first posting here.  My setup did not change, my firmware and setups were the same when this started.  I gotta believe it's my ISP struggling with loads.       Is there something you think I need extra now?  That's why I was asking about the "dns rules"  I've never setup and dns rules.   THANKS!  

    Like
    • G Mobley  I setup NextDNS CLI for the router. I also setup YogaDNS for every Microsoft Windows 10 setup. I also setup every browser DOH setting.  NextDNs works great . Very fast dns look ups. No lag at all.

      Like
  • Thanks!   I may try the client again.  I think my issues are really ISP related b/c up until ~ 3 weeks ago, the setup had been rock solid screaming.   I'll keep watching the ISP.  Stay safe, stay alive!  Peace. 

    Like
    • G Mobley    My ISP is Spectrum .   What is your ISP?  

      Can you do a trace route to 45.76.16.236, 191.96.51.196. and post them here. Thank you.

      Like
  • Got up this AM after switching back to NextDNS setup on yesterday AM at it appears NextDNS became "unreachable" sometime between 02:00AM-03:00AM EDT. 

    10-4, I'm a long time Spectrum customer with a generally reliable 300/20 service. 

    I restarted dnsmsgq on the router (Merlin) just to be sure it was not something lurking in there - nope - still very dead.  There was nothing in the syslog indicating issues outside of speed testing failed messages which is a clue to when it died.

    Switching the DNS resolver to QUAD immediately revived my DNS resolution.

    I'll keep trying to figure the root-cause out b/c I like the NextDNS service but I have a feeling it's not my router/setup b/c it has been stable / rock solid for more than a year using the NextDNS service.  The past 3-4 weeks however, have been awful with the family standing in my door or yelling, "The internet is down again!"  The best I've gotten is 1-2 days with NextDNS working, before it's not again.

    Here's the fresh tracert from a Windows box.  I think this is the root-cause of what some customers are seeing.
    >tracert 45.76.16.236

    Tracing route to dns.nextdns.io [45.76.16.236]
    over a maximum of 30 hops:

      1    36 ms    <1 ms    <1 ms  AC1900-FA38 [192.168.100.99]
      2     1 ms    <1 ms    <1 ms  192.168.111.99
      3    11 ms    16 ms    10 ms  65.190.80.1
      4    11 ms    17 ms    14 ms  174.111.102.224
    5 17 ms 14 ms 14 ms cpe-024-025-062-048.ec.res.rr.com [24.25.62.48]
    6 20 ms 14 ms 14 ms be31.drhmncev01r.southeast.rr.com [24.93.64.184]
      7    27 ms    22 ms    25 ms  66.109.6.224
      8    17 ms    20 ms    16 ms  66.109.5.117
    9 18 ms 22 ms 23 ms be-206-pe07.ashburn.va.ibone.comcast.net [50.242.149.253]
    10 20 ms 19 ms 21 ms be-2207-cs02.ashburn.va.ibone.comcast.net [96.110.32.189]
    11 22 ms 22 ms 19 ms be-1212-cr12.ashburn.va.ibone.comcast.net [96.110.32.206]
    12 25 ms 23 ms 26 ms be-301-cr11.pittsburgh.pa.ibone.comcast.net [96.110.39.166]
    13 36 ms 25 ms 29 ms be-1211-cs02.pittsburgh.pa.ibone.comcast.net [96.110.38.133]
    14 23 ms 27 ms 27 ms be-1212-cr12.pittsburgh.pa.ibone.comcast.net [96.110.38.150]
    15 34 ms 43 ms 35 ms be-301-cr14.350ecermak.il.ibone.comcast.net [96.110.39.157]
    16 40 ms 42 ms 39 ms be-1314-cs03.350ecermak.il.ibone.comcast.net [96.110.35.57]
    17 38 ms 38 ms 37 ms be-2311-pe11.350ecermak.il.ibone.comcast.net [96.110.33.202]
    18 41 ms 39 ms 59 ms 96-87-9-182-static.hfc.comcastbusiness.net [96.87.9.182]
     19     *        *        *     Request timed out.
     20     *        *        *     Request timed out.
     21     *        *        *     Request timed out.
    22 35 ms 36 ms 35 ms dns.nextdns.io [45.76.16.236]

    Trace complete.

    > tracert 191.96.51.196

    Tracing route to dns.nextdns.io [191.96.51.196]
    over a maximum of 30 hops:

      1    39 ms    <1 ms    <1 ms  AC1900-FA38 [192.168.100.99]
      2     1 ms     1 ms    <1 ms  192.168.111.99
    3 14 ms 11 ms 12 ms 065-190-080-001.inf.spectrum.com [65.190.80.1]
      4    13 ms    14 ms    40 ms  174.111.102.226
    5 8 ms 13 ms 14 ms cpe-024-025-062-050.ec.res.rr.com [24.25.62.50]
    6 18 ms 14 ms 22 ms be31.chrcnctr01r.southeast.rr.com [24.93.64.186]
    7 32 ms 19 ms 20 ms bu-ether11.atlngamq46w-bcr00.tbone.rr.com [66.109.6.34]
      8    19 ms    17 ms    18 ms  66.109.5.125
    9 35 ms 44 ms 24 ms ae14.cr4-atl2.ip4.gtt.net [208.116.217.29]
    10 37 ms 45 ms 38 ms ae13.cr10-chi1.ip4.gtt.net [213.254.230.165]
    11 39 ms 39 ms 48 ms ip4.gtt.net [208.116.128.54]
    12 36 ms 38 ms 37 ms 0.ae1.ar4.ord6.scnet.net [204.93.204.113]
    13 38 ms 34 ms 41 ms unknown.servercentral.net [50.31.158.46]
    14 39 ms 37 ms 40 ms dns.nextdns.io [191.96.51.196]

    Trace complete.

    And this below is dead on  why my linkages to NextDNS stopped working!

    >tracert 45.90.28.114

    Tracing route to dns1.nextdns.io [45.90.28.114]
    over a maximum of 30 hops:

      1    29 ms    <1 ms    <1 ms  AC1900-FA38[192.168.100.99]
      2    <1 ms    <1 ms    <1 ms  192.168.111.99
    3 12 ms 13 ms 13 ms 065-190-080-001.inf.spectrum.com [65.190.80.1]
      4    12 ms    17 ms    19 ms  174.111.102.224
    5 13 ms 10 ms 15 ms cpe-024-025-062-048.ec.res.rr.com [24.25.62.48]
    6 16 ms 14 ms 16 ms be31.drhmncev01r.southeast.rr.com [24.93.64.184]
      7    23 ms    22 ms    22 ms  66.109.6.224
    8 243 ms 238 ms 253 ms bu-ether12.vinnva0510w-bcr00.tbone.rr.com [66.109.6.31]
    9 223 ms 258 ms 256 ms ae-11.edge5.WashintonDC12.Level3.net [4.68.37.213]
    10 * 23 ms 25 ms ae-1-3501.ear3.NewYork1.Level3.net [4.69.150.202]
    11 26 ms 31 ms 29 ms CHOOPA-LLC.ear3.NewYork1.Level3.net [4.15.213.214]
     12     *        *        *     Request timed out.
     13     *        *        *     Request timed out.
     14     *        *        *     Request timed out.
    15 24 ms 28 ms 27 ms dns1.nextdns.io [45.90.28.114]

    Trace complete.

    >tracert 45.90.30.114

    Tracing route to dns2.nextdns.io [45.90.30.114]
    over a maximum of 30 hops:

      1    17 ms    <1 ms    <1 ms  AC1900-FA38 [192.168.100.99]
      2     1 ms     1 ms    <1 ms  192.168.111.99
    3 18 ms 13 ms 14 ms 065-190-080-001.inf.spectrum.com [65.190.80.1]
      4    14 ms    12 ms    13 ms  174.111.102.224
    5 12 ms 10 ms 21 ms cpe-024-025-062-048.ec.res.rr.com [24.25.62.48]
    6 21 ms 14 ms 14 ms be31.drhmncev01r.southeast.rr.com [24.93.64.184]
      7    21 ms    22 ms    30 ms  66.109.10.176
      8    17 ms    19 ms    22 ms  66.109.5.117
    9 16 ms 24 ms 23 ms ash-b2-link.ip.twelve99.net [62.115.188.210]
    10 23 ms 18 ms 18 ms voxility-svc071266-ic357612.ip.twelve99-cust.net [195.12.254.137]
     11     *        *        *     Request timed out.
     12     *        *        *     Request timed out.
     13    20 ms    19 ms    22 ms  45.11.106.10
    14 18 ms 28 ms 19 ms dns2.nextdns.io [45.90.30.114]

    Trace complete.

    Like
    • G Mobley not sure to see why this traceroute would show the root cause. You have between 18 and 28ms latency to primary and secondary anycast and 35ms to ultralow endpoint, which is pretty good.

      How did you configure nextdns? Using dnsmasq with UDP IPs and link IP or something else?

      Like 1
    • G Mobley I had a lot of similar issues for weeks,  that was until we disabled DNSSEC on the router.  Since then I've been rock sold for about a month now. 

      Like 1
      • G Mobley
      • G_Mobley
      • 2 yrs ago
      • 1
      • Reported - view

      Hans Geiblinger   THANKS! I double checked and I already have DNSSEC, Rebind, and Forward unchecked.  I'll keep digging. 

      Like 1
      • G Mobley
      • G_Mobley
      • 2 yrs ago
      • 1
      • Reported - view

      Olivier Poitrey  Maybe I'm reading the tracert incorrectly.  I view all the hops with timeouts as a red-flag. Before a few weeks ago, I was not seeing any timeouts on either tracert reaching back to the nextdns infrastruture.  I do not recall the number of hops though.

      Yes sir, no nextdns client. I setup NextDNS manually as I've done for the past year+.. in fact my renewal is coming up shortly. 

      1) make sure dnsmasq.conf has the correct entries (note: IPV6 is disabled)

      no-resolv
      bogus-priv
      strict-order
      server=45.90.30.0  (btw, that's what the generated page says "0" but I know it's really "114" per an earlier issue where you said it would never be '0'.
      server=45.90.28.0 (ditto above)
      add-cpe-id=XXXXXXXXX

      2) Alter stubby.yml to make sure it has the correct NextDNS entries:

      Set -> round_robin_upstreams: 0

      resolution_type: GETDNS_RESOLUTION_STUB
      dns_transport_list:
        - GETDNS_TRANSPORT_TLS
      tls_authentication: GETDNS_AUTHENTICATION_REQUIRED
      tls_query_padding_blocksize: 128
      appdata_dir: "/var/lib/misc"
      resolvconf: "/tmp/resolv.conf"
      edns_client_subnet_private: 1
      round_robin_upstreams: 0
      idle_timeout: 9000
      tls_connection_retries: 2
      tls_backoff_time: 900
      timeout: 3000
      listen_addresses:
        - 127.0.1.1@53
      upstream_recursive_servers:
        - address_data: 45.90.28.114
          tls_auth_name: "XXXXX.dns1.nextdns.io"
        - address_data: 45.90.30.114
          tls_auth_name: "XXXXX.dns2.nextdns.io"

      > restart dnsmsgq..

      I've restarted dnsmsgq service  few times just to make sure it's not lost and it never recovers until I drop the NextDNS entries and replace them with QUAD9 or Clouldflare or Google.   Then ususally no screams until I switch the router back to NextDNS and try again.

      Thanks for taking a look!  Let me know what I'm missing.  I've been following the NextDNS discussion in the ASUS Merlin forums for more than a year.. so this is really a mystery. 

      Like 1
    • G Mobley When I ran Asus a while ago, I thought you had to handle round_robin via a start script?

      1. Create a start script:

      /jffs/scripts/stubby.postconf

      2. Add:

      #!/bin/sh
      CONFIG=$1
      source /usr/sbin/helper.sh
      pc_replace "round_robin_upstreams: 1" "round_robin_upstreams: 0" $CONFIG
      Like 1
      • G Mobley
      • G_Mobley
      • 2 yrs ago
      • Reported - view

      Hans Geiblinger   Hi - Correct.  Works perfectly.

      #!/bin/sh
      #
      # Used by NextDNS to fix the /etc/stubby/stubby.yml AUTOMATICALLY to have "0" for round_robin_upstreams
      #
      CONFIG=$1
      source /usr/sbin/helper.sh
      pc_replace "round_robin_upstreams: 1" "round_robin_upstreams: 0" $CONFIG
      #
      # <EOF>

      Like
    • G Mobley routers in a traceroutes car choose to drop ICMP packets, that common and a non issue.

      For 1), you can use .0 if you have the add-cpe-id option. Using 114 won't change anything.

      I'm not sure why you have a stubby config if you configured dnsmasq to go to NextDNS IPs directly. It should be either one or the other.

      Could you simplify the config and remove stubby? If you run on ASUS Merlin, why not trying our CLI? It should make things more stable.

      Like 2
      • G Mobley
      • G_Mobley
      • 2 yrs ago
      • 1
      • Reported - view

      Olivier Poitrey  TY so much for the clarifications on the ICMP packets and "timeouts".  IDK that was the case with the tracert.   Still seems like an awful lot of hops to me.  As a performance guy, that # of hops would be a nightmare.

      OK on 0 or 114 and yes I have the add-cpe-id option set properly.

      As for using both dnsmasq and stubby, on ASUS Merlin, I run several other AMTM tools which leverage both stubby and dnsmasq (is my understanding) to implement those features:  skynet, diversion. 

      I realize that diversion has some duplicates to NextDNS but it has useful features you do not such as experimental blocking of certain PITA sites.   I've run with both diversion ON and OFF and it does not seem to matter with these recent failures I've been reporting. For the months where I had no issues, I had diversion ON + NextDNS with zero issues.

      In the very early threads where you were working with the Merlin developers, changes to both stubby.yml and dnsmasq.conf were listed by the SME on Merlin as required for the "manual" config - way before the client arrived.

      I do not run the client b/c it does not integrate well with the other added Merlin AMTM tooling is the last things in those threads. 

      Now that I know this is maybe not my ISP's many hops etc.. I'll keep experimenting with adding NextDNS back in.  For all I know it could be a problem caused by something in the entware updates too as they have been known to break things.

      Thanks for your guidance! 

      Like 1
  • Just an update. To be fair to NextDNS, I had to restart dnsmsgq this AM with it connected to QUAD9... so at this point, I think somethings up with the setup on my ASUS and maybe not totally NextDNS.  My apologies.  I'll keep digging into the setup.  I'd not be surprised if all those recent entware updates might be involved.    Cheers!  Stay safe, stay alive!

    Like
  • Updating this issue with these items:

    1. Switched to QUAD9 and had no DNS issues for 3 weeks.

    2. Switched back to NextDNS today and within 1 hour, had DNS resolution issues

    I caught STUBBY doing this:  Does this help with clues for why NextDNS is not behaving?

    Thanks!

    Like
    • G Mobley what is you stubby config and version?

      Like
      • G Mobley
      • G_Mobley
      • 2 yrs ago
      • Reported - view

      Hi Olivier:  Running most current ASUS Merlin

      [00:24:38.751471] STUBBY: Stubby version: Stubby 0.3.0
      [00:24:38.754325] STUBBY: Read config from file /etc/stubby/stubby.yml
      [00:24:38.754632] STUBBY: DNSSEC Validation is OFF
      [00:24:38.754664] STUBBY: Transport list is:
      [00:24:38.754690] STUBBY:   - TLS
      [00:24:38.754716] STUBBY: Privacy Usage Profile is Strict (Authentication required)

       I script change the round_robin_upstreams from 1 to 0 

      ... /stubby/stubby.yml

      resolution_type: GETDNS_RESOLUTION_STUB
      dns_transport_list:
        - GETDNS_TRANSPORT_TLS
      tls_authentication: GETDNS_AUTHENTICATION_REQUIRED
      tls_query_padding_blocksize: 128
      appdata_dir: "/var/lib/misc"
      resolvconf: "/tmp/resolv.conf"
      edns_client_subnet_private: 1
      round_robin_upstreams: 0
      idle_timeout: 9000
      tls_connection_retries: 2
      tls_backoff_time: 900
      timeout: 3000
      listen_addresses:
        - 127.0.1.1@53
      upstream_recursive_servers:
        - address_data: 45.90.28.0
          tls_auth_name: "XXXXXXX.dns1.nextdns.io"
        - address_data: 45.90.30.0
          tls_auth_name: "XXXXXXX.dns2.nextdns.io"

      Thanks for taking another look.  I monitor stubby -l now.. trying to catch whatever's causing my issues.

      G. Mobley

      Like
    • G Mobley Seems like stubby's fallback algorithm is pretty meh: https://github.com/getdnsapi/stubby/issues/105

      Does it fix the issue if you set round_robin_upstreams to 1?

      Any reason not wanting to use CLI? It should be much more stable.

      Like 2
      • G Mobley
      • G_Mobley
      • 2 yrs ago
      • Reported - view

      Olivier Poitrey  Thank you for the guidance.   I have never tried RRU to 1 b/c the setup instructions state that value must be "0"   I will try "1" tomorrow AM.  I cannot play anymore tonight.  

      WRT the NextDNS CLI, yes sir, I have ~ 6x IOT devices and cameras which do not function well when their DNS messed with so I list them on the "DNSFilter" page so they go directly to QUAD9.  They work fine when that is setup.    FWIW, when I have this exact same setup hitting 2 x QUAD9 hosts +  round_robin_upstreams:1  (or also  Clouldflare (used them both to test), the DNS worked for 3 weeks without a hiccup or blip of single DNS cannot be resolved.   I was watching it with "stubby -l"  I'll change tomorrow and see how it goes. 

      I'm reviewing that stubby link you posted above.  They are talking a lot about the timeouts...   What's the best timeout for NextDNS?  Quote>  "I am wondering if it would be worthwhile adding a note in stubby.yml.example explaining that stubby will cycle servers when round_robin_upstreams: 0 is set and idle_timeout is set to a value longer than a given upstream server has configured on the backend."  The default idle_timeout in my current stubby.yml is 9000ms.   Could that be triggering the problem that post is referencing?   Thank you sir.

      Like
      • G Mobley
      • G_Mobley
      • 2 yrs ago
      • Reported - view

      Olivier Poitrey   Morning!  ~ 24 hours with RRU:1 and I've not seen (nor have a I heard screams) about any DNS not resolving. TY!  This <IS> progress!   I'll keep the router running this way for a week, continue to monitor and report in again.    Do you have suggestions on the "proper idle_timeout" for NextDNS which is still set at 9000ms which is the default delivered in ASUS Merlin?  Many people cannot change that b/c that value is not exposed in the ASUS Merlin GUI.  I think they settled on 9000ms as good value for QUAD9, Cloudflare and others they built into the GUI to select for DoT.  You made my day! 

      Like
    • G Mobley our keep alive is around 30s so 9s should be fine. This is frustrating because this issue is not easily reproducible (I never managed to), which make it hard to debug...

      Like 2
      • G Mobley
      • G_Mobley
      • 2 yrs ago
      • Reported - view

      Olivier Poitrey  Morning sir.  Reporting on using above DNS issues on my ASUS/AX86U/Merlin/386.2_2.  When I changed the stubby.yml to use the default of RRU:1 vs RRU:0, the router has not lost DNS resolution - no family standing in my door!   I know that's not the configuration as stated but it's actually returned to being reliable. 

      My gut says many ASUS/Merlin users simple fill in the WAN GUI page and never bother editing the correct stubby.yml files which defaults to stubby using -> RRU:1 on Merlin.   I'll continuing monitoring the logs. 

      I also found your posting explaining when to use:  X.Y.Z.0 vs the X.Y.Z.### setups.  I'd never read or understood that before that post.  Maybe a good edition to the generated setups?    

      See -> https://help.nextdns.io/t/p8htq2y/dnsmasq-setting-clarification-ipv4-address-and-strict-order  

      At this point I feel the issue lies in stubby handling the RR setup on a RRU:0 setup - especially based on the link you posted earlier.  I'm betting it's not a common default or code-path. 

      If I can, maybe I'll put QUAD9 back in, maually set stubby to RRU:0 to see if DNS starts crapping out with that too!   I've never tried that one.

      Thank you sir!

      Like
      • G Mobley
      • G_Mobley
      • 2 yrs ago
      • Reported - view

      Hi Olivier, Reporting back in.  My NextDNS + Merlin 384.2_2 has been very stable since the above change to use RR:1.   I too believe there is something wrong in DNSMSGQ when setup to use RR:0.   I'm keeping the config running this way and report back in another week.  Since January, 2021, the router has never stayed up more than about 2 days, really meaning DNS working and family not screaming,  with the same config and using RR:0.     Thanks!

      Like
  • https://nextdns.io/diag/36509650-aa87-11eb-960d-33c14b839c06

    Do you think that if I enable ipv6, I will increase the speed?
    Like
      • DynamicNotSlow
      • Pro subscriber ✓
      • DynamicNotSlow
      • 2 yrs ago
      • Reported - view

      Vitor you should definitely enable IPv6

      Like
      • Vitor
      • Vitor
      • 2 yrs ago
      • Reported - view

      DynamicNotSlow could you please explain why? :)

      Like
      • DynamicNotSlow
      • Pro subscriber ✓
      • DynamicNotSlow
      • 2 yrs ago
      • Reported - view

      Vitor better compatibility and common sense as IPv6 is current one. v4 is outdated

      Like
    • DynamicNotSlow if only… IPv6 can improve or degrade your latency, it depends on many variables. Many ISPs won't monitor their IPv6 as well as their IPv4 network, so you can expect worst performance on IPv6 more often than not.

      Think of IPv6 as an alternate (somewhat partial) internet. You may think it would give you the best of both but it's more complex than that. There is a great write-up on the matter by Avery Pennarun from tailscale: https://tailscale.com/blog/two-internets-both-flakey/

      Note that I support IPv6 development and I do not advise not to enable it. But believing IPv6 will fix your latency issues is a sweet dream :)

      Back to Victor question, based on your diag, Fortaleza PoP gives 28ms in the top and 66ms in the traceroute, which could be due to a difference of measurement between TCP  (top) and ICMP (traceroute) or an unstable route. You can repeat the diag several time to see if those results are stable. From the traceroute, rio looks like a better choice for you. I'll check why secondary is not getting to rio too.

      If you want to take your chance with Fortaleza, try using DoT or DoH instead of UDP in order to switch to ultralow.

      Like 1
      • Vitor
      • Vitor
      • 2 yrs ago
      • Reported - view

      Olivier Poitrey Thanks, how i can use DOT on windows? Only with the APP?

      Like
    • Vitor DOT is now supported in the latest version of YogaDNS. You can use DOH via YogaDNS as well, so it's a nice app if you want to try both protocols.

      Like
    • Vitor Fyi, The Windows app only uses DoH

      Like
Like Follow
  • 2 yrs agoLast active
  • 58Replies
  • 1484Views
  • 9 Following