BIND-9.3.2 (from 5.5-STABLE) segfault under load...

Chuck Swiger cswiger at mac.com
Sun Dec 31 08:45:48 PST 2006


Doug Barton wrote:
> Chuck Swiger wrote:
>> Hi--
>>
>> I had named segfault a day or so ago under high load ("adnslogres -c
>> 200" against a webserver logfile) after logging the following:
> 
> Hard to tell if your problem here is related to running on 5.5 or not,
> but of course recommendation number one is to consider upgrading to
> 6.x. Recommendation number two is to upgrade BIND to 9.3.3, preferably
> by upgrading to 6.2-RC2, or by upgrading to the head of RELENG_5, or
> as a last resort by using the port, with or without the option to
> replace the base BIND.

Noted, thanks.  I've been following 6-STABLE on a pair of test machines, but 
I've recently been bitten by whatever the bug was earlier this week or last 
which resulted in daemon processes dying early in the boot, so I'd prefer to 
be a bit more conservative until 6-STABLE settles down more.

>> [ ... ]
>>> Dec 28 03:38:56 <daemon.notice> pi named[1853]: enforced
>>> delegation-only for 'AR' (ctina.ar/A/IN) from 137.39.1.3#53
> 
> If you're using this option, please make sure that you know why you
> are using it, and what the potential side effects are. That discussion
> is off topic for this list, but feel free to take it up on
> bind-users at isc.org if you wish.

The primary function of the machine in question is a SMTP relay; a secondary 
major purpose is running Apache.  Having various top-level domains return 
non-delegation records (ie SiteFinder) breaks some of my anti-spam checking...

>>> Dec 28 03:50:23 <daemon.warn> pi named[1853]: client 127.0.0.1#53077:
>>> no more recursive clients: quota reached
> 
> There is extensive discussion about this problem in the bind-users
> archives. Take a look at file:///usr/share/doc/bind9/arm/ and check
> out the quota options to get this adjusted to where it needs to be for
> your situation. Alternatively, if you're sure that the excess load is
> caused by the adnslogres program, try lowering the number of
> concurrent connections.

I'm quite sure the load is being caused by adnslogres; the "-c 200" flag 
refers to the number of outstanding connection requests that program will 
create.  When performing DNS resolution of various IPs from a logfile, the 
more connections permitted, the faster the resolution of the file completes as 
you are mostly being held up by non-responsive nameservers which will time-out.

What strikes me as odd is that the BIND docs claim that recursive-clients 
defaults to a value of 1000...?  The named processes' memory usage shoots up 
from a nominal starting point of 5 MB to around 45 MB after being queried 
under this load.

Anyway, I've reduced adnslogres to using 50 outstanding connection requests.

>> As the subject mentions, this is a Dell 1850 (rackmount PowerEdge)
>> running FreeBSD-5.5 & BIND-9.3.2; until just now, everything had been
>> running stably for months at a time.
> 
> I assume you've checked the usual suspects, dead fans, other hardware
> problems, etc?

Within reason, yes.  I haven't taken the system down to perform 24-hours of 
testing with Memtest86 or something like that, but the fans are OK (it's a 
P3-based system; it doesn't run nearly as hot as the later Xeon-based models), 
and there are no ECC warnings from the RAM, and drives are in a RAID-1 mirror 
which is reporting no signs of trouble.

Thanks for your feedback,
-- 
-Chuck


More information about the freebsd-stable mailing list