Re: resolv.conf question
- Reply: Dan Mahoney : "Re: resolv.conf question"
- In reply to: Doug Denault : "Re: resolv.conf question"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 13 Oct 2022 01:56:29 UTC
Doug Denault wrote:
> > Doug Denault wrote:
> > So I tried to RTFM, /usr/src/contrib/ldns/resolver.c in this case. It is
> > almost certain that the system was up but bind did not respond. The source
> > is a bit above my pay grade but it did seem possible that if that was the
> > case, the second server was never tried. This is what actually happened.
> >
> > There were no other issues as each of the jails started fine with a manual
> > boot. Does anyone know if the timeout and/or retry setting offer a way
> > around this.
>
> For performance reasons, especially if the first listed server is always
> used, I want that in our data center. Aside from speed, no hacking is
> possible. My purpose here is to figure how resolv.conf works. If more than
> one entry is effectively useless, I would be tempted to use 8.8.8.8. Also
> the jail mother had not been booted in several months and only now because I
> f-ed up changing the root password.
I still have a physical copy of DNS and BIND by Paul Albitz & Cricket
Liu published by O'Reilly 1992. I have no idea if the way this was
described there still matches the way it is resolved now. But I think
it likely it is still at least similar.
It is described that the timeouts will depend upon the number of
nameserver directives in the resolv.conf file. Here is a table that I
reproduce here.
| Name Servers Configured
------+----------------------------
Retry | 1 | 2 | 3
------+---------+---------+--------
0 | 5s | (2x)5s | (3x)5s
1 | 10s | (2x)5s | (3x)3s
2 | 20s | (2x)10s | (3x)6s
3 | 40s | (2x)20s | (3x)13s
------+---------+---------+--------
Total | 75s | 80s | 81s
If there is no nameserver configured then the default is to query the
nameserver on the local system. None is the same as one configured
local host nameserver.
If there is one nameserver configured then it will query that
nameserver with a timeout of 5 seconds. This is the timeout before
sending another query. A retry. If the resolver encounters and error
that indicates the nameserver is really down or unreachable or times
out it will double the timeout and query the nameserver again.
If there is more than one nameserver configured then the libc resolver
queries the first one in the list with a timeout of 5 seconds. If
that query times out or recieves an error then it falls back to the
next nameserver in the list with the same 5 second timeout. If the
resolver reaches the end of the list and all of them (up to three)
timed out or received an error then it will update the timeouts and
cycle through the list again.
The next retry through the list will have timeouts set according to a
calculation of 10 seconds divided by the number of nameservers
configured rounded down. One nameserver is 10 seconds. Two
nameservers is 5 seconds. Three nameservers is 3 seconds.
If that round of queries through each of the nameservers again
receives errors or timeouts then the timeout values are doubled and
the queries retry again.
There are four possible rounds of queries. The first initial round
with the 5s timeouts. The second round with the calculated timeouts.
The 3rd and 4th rounds with the calculated timeouts doubled each
round.
That accounts for why the total time it takes a DNS lookup using the
libc resolver will vary among 75s, 80s, 81s depending upon the number
of nameserver directives configured in the case that all of them
return either errors or are unreachable.
Again let me repeat that this was as descibed in 1992 and I have no
idea if the current implementation is still the same. But at least it
lays the foundation for the way things used to work.
To get come recent data I tried it on my NetBSD 9.0 system here. (I
know I am behind and need to upgrade it to the current 9.3.) I tried
the four combinations with unreachable (non-existent) nameservers.
No nameservers configured. No local host nameserver running.
netbsd# time host example.com
;; connection timed out; no servers could be reached
12.17s real 0.02s user 0.02s system
One unreachable nameserver configured.
netbsd# time host example.com
;; connection timed out; no servers could be reached
10.05s real 0.02s user 0.00s system
Two unreachable nameservers configured.
netbsd# time host example.com
;; connection timed out; no servers could be reached
12.07s real 0.01s user 0.02s system
Three unreachable nameservers configured.
netbsd# time host example.com
;; connection timed out; no servers could be reached
14.10s real 0.03s user 0.01s system
Then I configured two nameserver where the first one was unreachable
but the second one was local, available, and online.
netbsd# time host example.com
example.com has address 93.184.216.34
example.com has IPv6 address 2606:2800:220:1:248:1893:25c8:1946
example.com mail is handled by 0 .
3.41s real 0.02s user 0.01s system
Then again with three nameservers but with the first two being
unreachable and again the third one, the last one, being available.
netbsd# time host example.com
example.com has address 93.184.216.34
example.com has IPv6 address 2606:2800:220:1:248:1893:25c8:1946
example.com mail is handled by 0 .
6.09s real 0.01s user 0.02s system
Therefore it looks like the algorithm implemented now is similar but
somewhat different than that as historically described.
================================================================
Let's see the same experiment again with FreeBSD 12.3.
No nameservers configured. No local host nameserver running.
[root@freebsd ~]# time host example.com
;; connection timed out; no servers could be reached
real 0m20.219s
user 0m0.002s
sys 0m0.003s
One unreachable nameserver configured.
[root@freebsd ~]# time host example.com
;; connection timed out; no servers could be reached
real 0m10.111s
user 0m0.000s
sys 0m0.006s
Two unreachable nameservers configured.
[root@freebsd ~]# time host example.com
;; connection timed out; no servers could be reached
real 0m20.226s
user 0m0.005s
sys 0m0.000s
Three unreachable nameservers configured.
[root@freebsd ~]# time host example.com
;; connection timed out; no servers could be reached
real 0m30.409s
user 0m0.000s
sys 0m0.007s
Then I configured two nameserver where the first one was unreachable
but the second one was local, available, and online.
[root@freebsd ~]# time host example.com
example.com has address 93.184.216.34
example.com has IPv6 address 2606:2800:220:1:248:1893:25c8:1946
example.com mail is handled by 0 .
real 0m10.091s
user 0m0.000s
sys 0m0.007s
Then again with three nameservers but with the first two being
unreachable and again the third one, the last one, being available.
[root@freebsd ~]# time host example.com
example.com has address 93.184.216.34
example.com has IPv6 address 2606:2800:220:1:248:1893:25c8:1946
example.com mail is handled by 0 .
real 0m20.309s
user 0m0.002s
sys 0m0.004s
================================================================
Let's see the same experiment again with Debian Unstable with glibc
version 2.35.
No nameservers configured. No local host nameserver running.
root@glibc:~# time host example.com
;; communications error to ::1#53: connection refused
;; communications error to ::1#53: connection refused
;; communications error to 127.0.0.1#53: connection refused
;; no servers could be reached
real 0m0.031s
user 0m0.015s
sys 0m0.005s
Interesting that it complains about both IPv6 failure and IPv4 failure
whereas traditionally it is silent. ("::1" being IPv6 localhost, and
127.0.0.1 being IPv4 localhost.)
One unreachable IPv4 local host nameserver configured.
root@glibc:~# time host example.com
;; communications error to 127.0.0.1#53: connection refused
;; communications error to 127.0.0.1#53: connection refused
;; no servers could be reached
real 0m0.034s
user 0m0.019s
sys 0m0.000s
One unreachable IPv4 nameserver configured. This doesn't show timestamps
but each line was output at 5s intervals.
root@glibc:~# time host example.com
;; communications error to 192.168.1.151#53: timed out
;; communications error to 192.168.1.151#53: timed out
;; no servers could be reached
real 0m10.045s
user 0m0.016s
sys 0m0.008s
Two unreachable nameservers configured. This doesn't show timestamps
but each line was output at 5s intervals.
root@glibc:~# time host example.com
;; communications error to 192.168.1.151#53: timed out
;; communications error to 192.168.1.151#53: timed out
;; communications error to 192.168.1.152#53: timed out
;; no servers could be reached
real 0m15.049s
user 0m0.014s
sys 0m0.009s
Three unreachable nameservers configured.
root@glibc:~# time host example.com
;; communications error to 192.168.1.151#53: timed out
;; communications error to 192.168.1.151#53: timed out
;; communications error to 192.168.1.152#53: timed out
;; communications error to 192.168.1.153#53: timed out
;; no servers could be reached
real 0m20.052s
user 0m0.012s
sys 0m0.008s
================================================================
I am not sure if this in any way answers your questions. But
hopefully it provides some interesting information about the behavior
of the resolver in these various different systems.
Personally I almost always configure a local caching nameserver on the
local host for my server systems. For me that is almost always the
right answer for Internet connected servers.
However for DHCP mobile clients I mostly don't and use the DHCP
provided nameservers. That's the best answer to allow spoofing for
captive portal open WiFi Access Points such as at namebrand coffee
shops and airports.
One more "however" here as not validating DNSSEC also allows spoofing.
Therefore I turn my mobile laptop's local DNSSEC validating nameserver
on and off manually. I need it on for security. I need it off for
clicking through the EULA on a captive portal. Captive portals are
rather a mess.
https://en.wikipedia.org/wiki/Captive_portal
Bob