0

I noticed that connections to some (not all) internal machines take about 10s to connect - for ssh and docker pull.

If I run ping on them, some hosts also take 10s to start up, some are immediate - usually the same for any given address regardless of how frequently I rerun ping.

Either way, running nslookup always quickly prints a non-authoritative response from one server, then hangs while 'trying the next server' before timing out:

$ nslookup xxxx.internaldomain
Server:         10.10.x.x
Address:        10.10.x.x#53

Name: xxxx.internaldomain Address: 10.20.y.y ;; Got recursion not available from 10.10.x.x, trying next server <---- 10s delay here ;; connection timed out; no servers could be reached

Another one is a bit more complex, but amounts to the same thing:

$ nslookup something.company.com
;; Got recursion not available from 10.10.x.x, trying next server
Server:         127.0.0.53
Address:        127.0.0.53#53

Non-authoritative answer: something.company.com canonical name = docker-reg.internal. docker-reg.internal canonical name = something.internaldomain. Name: something.internaldomain Address: 10.10.r.r ;; Got recursion not available from 10.10.x.x, trying next server <---- 10s delay here ;; connection timed out; no servers could be reached

nslookup is happy and fast with external dns, like bbc.co.uk.

My resolv.conf looks like this:

domain internaldomain
nameserver 10.10.x.x
nameserver 127.0.0.53
search internaldomain some other internal tlds

I don't see any other nameservers mentioned, so I presume it's trying the global nameservers, but I don't understand why for selected internal hosts ssh and ping reliably don't hang, for some they do, but nslookup always does.

I believe this is a different question to Very slow DNS lookup


Update:

$ sudo -s netstat -anlp|grep ':53 '
tcp        0      0 192.168.122.1:53        0.0.0.0:*               LISTEN      2228/dnsmasq        
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN      1121/systemd-resolv 
udp        0      0 192.168.122.1:53        0.0.0.0:*                           2228/dnsmasq        
udp        0      0 127.0.0.53:53           0.0.0.0:*                           1121/systemd-resolv 

Also, this issue seems to affect Ubuntu, not the majority of developers who use Macbooks: my colleague on Ubuntu has the same issue.


Another update!

My /etc/systemd/resolved.conf is all comments:

[Resolve]
#DNS=
#FallbackDNS=
#Domains=
#LLMNR=no
#MulticastDNS=no
#DNSSEC=no
#Cache=yes
#DNSStubListener=yes

Also, if I try running with 'nslookup -anything xxxx.internaldomain', I get this with no delays (I tried -anything after -debug didn't produce reams of useful stuff):

$ nslookup -anything dockerio.badoo.com
Server:         10.10.x.x
Address:        10.10.x.x#53

Non-authoritative answer: something.company.com canonical name = docker-reg.internal. docker-reg.internal canonical name = something.internaldomain. Name: something.internaldomain Address: 10.10.r.r

I can get a version though:

$ nslookup -version
nslookup 9.11.3-1ubuntu1.13-Ubuntu

Another update:

$ systemd-resolve --status
Global
         DNS Servers: 10.10.x.x
          DNS Domain: various
                      internal
                      domains
          DNSSEC NTA: 10.in-addr.arpa
                      xx1.172.in-addr.arpa
                      168.192.in-addr.arpa
                      xx2.172.in-addr.arpa  # Lots of these 172s
                      internal
                      x.x.ip6.arpa
                      various
                      other
                      internals

Link 191 (cscotun0) Current Scopes: none LLMNR setting: yes MulticastDNS setting: no DNSSEC setting: no DNSSEC supported: no

Link 15 (docker0) Current Scopes: none LLMNR setting: yes MulticastDNS setting: no DNSSEC setting: no DNSSEC supported: no

Link 14 (br-04d8e612xxxx) Current Scopes: none LLMNR setting: yes MulticastDNS setting: no DNSSEC setting: no DNSSEC supported: no

Link 7 (virbr0-nic) Current Scopes: none LLMNR setting: yes MulticastDNS setting: no DNSSEC setting: no DNSSEC supported: no

Link 6 (virbr0) Current Scopes: none LLMNR setting: yes MulticastDNS setting: no DNSSEC setting: no DNSSEC supported: no

Link 5 (virbr1-nic) Current Scopes: none LLMNR setting: yes MulticastDNS setting: no DNSSEC setting: no DNSSEC supported: no

Link 4 (virbr1) Current Scopes: none LLMNR setting: yes MulticastDNS setting: no DNSSEC setting: no DNSSEC supported: no

Link 3 (wlp4s0) Current Scopes: DNS LLMNR setting: yes MulticastDNS setting: no DNSSEC setting: no DNSSEC supported: no DNS Servers: 194.168.4.100 # These are my home ISP 194.168.8.100 DNS Domain: ~.

Link 2 (enp0s3xxx) Current Scopes: none LLMNR setting: yes MulticastDNS setting: no DNSSEC setting: no DNSSEC supported: no

1 Answers1

1

The problem was down to systemd-resolved, and was fixed by replacing /etc/resolv.conf with a symlink to a copy of the file.

# mv /etc/resolv.conf /etc/resolv.conf_bak && \
  ln -s /run/systemd/resolve/resolv.conf /etc/resolv.conf

I can't take credit for this - Head of Service Engineering took an interest in the internal ticket I raised, but that's why he's paid the big bucks.

After some experimentation and searching, he cited https://moss.sh/name-resolution-issue-systemd-resolved/

It seems the service was trying to handle everything, but it changes its MODE of work depending on whether /etc/resolv.conf is a symlink in its config or not!

One bewildering item: when I edited /etc/resolv.conf - with vi or just appending lines with shell redirect, the file was either instantly restored or otherwise protected (though lsof showed nothing, nor did lsattr).