Re: resolv.conf question

From: Dan Mahoney <freebsd_at_gushi.org>
Date: Thu, 13 Oct 2022 09:15:57 UTC

> On Oct 12, 2022, at 18:56, Bob Proulx <bob@proulx.com> wrote:
> 
> Doug Denault wrote:
>>> Doug Denault wrote:
>>>      So I tried to RTFM, /usr/src/contrib/ldns/resolver.c in this case. It is
>>>      almost certain that the system was up but bind did not respond. The source
>>>      is a bit above my pay grade but it did seem possible that if that was the
>>>      case, the second server was never tried. This is what actually happened.
>>> 
>>>      There were no other issues as each of the jails started fine with a manual
>>>      boot. Does anyone know if the timeout and/or retry setting offer a way
>>>      around this.
>> 
>> For performance reasons, especially if the first listed server is always
>> used, I want that in our data center. Aside from speed, no hacking is
>> possible. My purpose here is to figure how resolv.conf works. If more than
>> one entry is effectively useless, I would be tempted to use 8.8.8.8. Also
>> the jail mother had not been booted in several months and only now because I
>> f-ed up changing the root password.
> 
> I still have a physical copy of DNS and BIND by Paul Albitz & Cricket
> Liu published by O'Reilly 1992.  I have no idea if the way this was
> described there still matches the way it is resolved now.  But I think
> it likely it is still at least similar.
> 

Long message snipped.

I really wish the DNS resolver libraries in the system stack supported quicker failover, or perhaps randomizing the list of servers.

If you're falling back to the second line in your resolv.conf, something has gone terribly wrong.  In practice, the 5 seconds * 3 tries failover delay to hit that second server is pretty unusable on a normal system.  I like having it there in the SOLE case that if my primary resolvers are down, that I can still authenticate my IP address enough to log in, but by that point many servers that depend on DNS (including sshd, which tries to resolve the connecting ip; mail servers, which do the same; database servers, which may rely on DNS to connect to a database, as well as any dynamic web code) are all hopefully broken.

If you're doing any kind of mail service that depends on RBL's or depend on any kind of server that does geolocation, do not use 8.8.8.8 or any of the other public open resolvers, they will rate limit you when you least expect it, and they do not faithfully give you information that's geographically relevant to you.

It's trivially easy to run an unbound caching resolver on localhost, and it gives you the benefit of DNSSEC as well.

Since you say "Data Center" you may want to anycast your caching resolver.  Using modern routing protocols, you can put an IP address on lo0, and announce it with OSPF/BGP into your network stack, with a simple script that removes the IP address from lo0 if it detects the nameserver not answering.  There are also hardware load balancers you can put such things behind.

Finally, you might find different results asking your question on bind-users@lists.isc.org.  The people here are great too, no doubt, but the focus is on DNS there, if that's your line of questioning.

-Dan