8-STABLE freezes on UDP traffic (DNS), 7.x doesn't
bra at fsn.hu
Thu Mar 25 19:31:00 UTC 2010
Pyun YongHyeon wrote:
> On Thu, Mar 25, 2010 at 03:22:04PM +0100, Attila Nagy wrote:
>> I have some recursive nameservers, running unbound and 7.2-STABLE #0:
>> Wed Sep 2 13:37:17 CEST 2009 on a bunch of HP BL460c machines (bce
>> These work OK.
>> During the process of migrating to 8.x, I've upgraded one of these
>> machines to 8.0-STABLE #25: Tue Mar 9 18:15:34 CET 2010 (the dates
>> indicate an approximate time, when the source was checked out from
>> cvsup.hu.freebsd.org, I don't know the exact revision).
>> The first problem was that the machine occasionally lost network access
>> for some minutes. I could log in on the console, and I could see the
>> processes, involved in network IO in "keglim" state, but couldn't do any
>> network IO. This lasted for some minutes, then everything came back to
>> I could fix this issue by raising kern.ipc.nmbclusters to 51200
>> (doubling from its default size), when I can't see these blackouts.
>> But now the machine freezes. It can run for about a day, and then it
>> just freezes. I can't even break in to the debugger with sending NMI to it.
>> top says:
>> last pid: 92428; load averages: 0.49, 0.40, 0.38 up 0+21:13:18
>> 43 processes: 2 running, 38 sleeping, 1 zombie, 2 lock
>> CPU: 1.3% user, 0.0% nice, 1.3% system, 26.0% interrupt, 71.3% idle
>> Mem: 1682M Active, 99M Inact, 227M Wired, 5444K Cache, 44M Buf, 5899M Free
>> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
>> 45011 bind 4 49 0 1734M 1722M RUN 2 37:42 22.17% unbound
>> 712 bind 3 44 0 70892K 19904K uwait 0 71:07 3.86%
>> The common in these freezes seems to be the high interrupt count.
>> Normally, during load the CPU times look like this:
>> CPU: 3.5% user, 0.0% nice, 1.8% system, 0.4% interrupt, 94.4% idle
>> I could observe a "freeze", where top remained running and everything
>> was 0%, except interrupt, which was 25% exactly (the machine has four
>> cores), and another, where I could save the following console output:
>> CPU: 0.0% user, 0.0% nice, 0.2% system, 50.0% interrupt, 49.8% idle
> When you see high number of interrupts, could you check this comes
> from bce(4)? I guess you can use systat(1) to check how many number
> interrupts are generated from bce(4).
I've tried it multiple times, but couldn't yet catch the moment when the
machine was still alive (so the script could run) and there were
increased amount of interrupts.
More information about the freebsd-stable