BIND-9.3.2 (from 5.5-STABLE) segfault under load...
cswiger at mac.com
Sun Dec 31 08:45:48 PST 2006
Doug Barton wrote:
> Chuck Swiger wrote:
>> I had named segfault a day or so ago under high load ("adnslogres -c
>> 200" against a webserver logfile) after logging the following:
> Hard to tell if your problem here is related to running on 5.5 or not,
> but of course recommendation number one is to consider upgrading to
> 6.x. Recommendation number two is to upgrade BIND to 9.3.3, preferably
> by upgrading to 6.2-RC2, or by upgrading to the head of RELENG_5, or
> as a last resort by using the port, with or without the option to
> replace the base BIND.
Noted, thanks. I've been following 6-STABLE on a pair of test machines, but
I've recently been bitten by whatever the bug was earlier this week or last
which resulted in daemon processes dying early in the boot, so I'd prefer to
be a bit more conservative until 6-STABLE settles down more.
>> [ ... ]
>>> Dec 28 03:38:56 <daemon.notice> pi named: enforced
>>> delegation-only for 'AR' (ctina.ar/A/IN) from 126.96.36.199#53
> If you're using this option, please make sure that you know why you
> are using it, and what the potential side effects are. That discussion
> is off topic for this list, but feel free to take it up on
> bind-users at isc.org if you wish.
The primary function of the machine in question is a SMTP relay; a secondary
major purpose is running Apache. Having various top-level domains return
non-delegation records (ie SiteFinder) breaks some of my anti-spam checking...
>>> Dec 28 03:50:23 <daemon.warn> pi named: client 127.0.0.1#53077:
>>> no more recursive clients: quota reached
> There is extensive discussion about this problem in the bind-users
> archives. Take a look at file:///usr/share/doc/bind9/arm/ and check
> out the quota options to get this adjusted to where it needs to be for
> your situation. Alternatively, if you're sure that the excess load is
> caused by the adnslogres program, try lowering the number of
> concurrent connections.
I'm quite sure the load is being caused by adnslogres; the "-c 200" flag
refers to the number of outstanding connection requests that program will
create. When performing DNS resolution of various IPs from a logfile, the
more connections permitted, the faster the resolution of the file completes as
you are mostly being held up by non-responsive nameservers which will time-out.
What strikes me as odd is that the BIND docs claim that recursive-clients
defaults to a value of 1000...? The named processes' memory usage shoots up
from a nominal starting point of 5 MB to around 45 MB after being queried
under this load.
Anyway, I've reduced adnslogres to using 50 outstanding connection requests.
>> As the subject mentions, this is a Dell 1850 (rackmount PowerEdge)
>> running FreeBSD-5.5 & BIND-9.3.2; until just now, everything had been
>> running stably for months at a time.
> I assume you've checked the usual suspects, dead fans, other hardware
> problems, etc?
Within reason, yes. I haven't taken the system down to perform 24-hours of
testing with Memtest86 or something like that, but the fans are OK (it's a
P3-based system; it doesn't run nearly as hot as the later Xeon-based models),
and there are no ECC warnings from the RAM, and drives are in a RAID-1 mirror
which is reporting no signs of trouble.
Thanks for your feedback,
More information about the freebsd-stable