8-STABLE freezes on UDP traffic (DNS), 7.x doesn't
Pyun YongHyeon
pyunyh at gmail.com
Thu Mar 25 18:36:57 UTC 2010
On Thu, Mar 25, 2010 at 03:22:04PM +0100, Attila Nagy wrote:
> Hi,
>
> I have some recursive nameservers, running unbound and 7.2-STABLE #0:
> Wed Sep 2 13:37:17 CEST 2009 on a bunch of HP BL460c machines (bce
> interfaces).
> These work OK.
>
> During the process of migrating to 8.x, I've upgraded one of these
> machines to 8.0-STABLE #25: Tue Mar 9 18:15:34 CET 2010 (the dates
> indicate an approximate time, when the source was checked out from
> cvsup.hu.freebsd.org, I don't know the exact revision).
>
> The first problem was that the machine occasionally lost network access
> for some minutes. I could log in on the console, and I could see the
> processes, involved in network IO in "keglim" state, but couldn't do any
> network IO. This lasted for some minutes, then everything came back to
> normal.
> I could fix this issue by raising kern.ipc.nmbclusters to 51200
> (doubling from its default size), when I can't see these blackouts.
>
> But now the machine freezes. It can run for about a day, and then it
> just freezes. I can't even break in to the debugger with sending NMI to it.
> top says:
> last pid: 92428; load averages: 0.49, 0.40, 0.38 up 0+21:13:18
> 07:41:43
> 43 processes: 2 running, 38 sleeping, 1 zombie, 2 lock
> CPU: 1.3% user, 0.0% nice, 1.3% system, 26.0% interrupt, 71.3% idle
> Mem: 1682M Active, 99M Inact, 227M Wired, 5444K Cache, 44M Buf, 5899M Free
> Swap:
>
> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
> 45011 bind 4 49 0 1734M 1722M RUN 2 37:42 22.17% unbound
> 712 bind 3 44 0 70892K 19904K uwait 0 71:07 3.86%
> python2.6
>
> The common in these freezes seems to be the high interrupt count.
> Normally, during load the CPU times look like this:
> CPU: 3.5% user, 0.0% nice, 1.8% system, 0.4% interrupt, 94.4% idle
>
> I could observe a "freeze", where top remained running and everything
> was 0%, except interrupt, which was 25% exactly (the machine has four
> cores), and another, where I could save the following console output:
> CPU: 0.0% user, 0.0% nice, 0.2% system, 50.0% interrupt, 49.8% idle
When you see high number of interrupts, could you check this comes
from bce(4)? I guess you can use systat(1) to check how many number
interrupts are generated from bce(4).
> .......(partial, broken line)....32M 2423M *udp 1 50:16 10.89% unbound
> 714 bind 3 44 0 70892K 26852K uwait 3 8:41 4.69%
> python2.6
> 61004 root 1 62 0 37428K 10876K *udp 1 0:00 1.56% python
> 706 root 1 44 0 2696K 624K piperd 1 0:07 0.00%
> readproctit
>
> Both unbound and python accepts DNS requests, and it seems when 25%
> interrupt happens, only unbound is in *udp state, where it is 50%, both
> programs are in that state.
More information about the freebsd-stable
mailing list