Kernel memory corruption(?) with age(4)

YongHyeon PYUN pyunyh at gmail.com
Wed Mar 30 20:30:13 UTC 2011


On Wed, Mar 30, 2011 at 09:50:12PM +0200, Yamagi Burmeister wrote:
> On Wed, 30 Mar 2011, YongHyeon PYUN wrote:
> 
> >On Wed, Mar 30, 2011 at 04:22:23PM +0200, Yamagi Burmeister wrote:
> >
> >>All for boxes are unstable if the Attansic NIC is in use, no one of them
> >>survived more than 60 minutes of ~20mb/s network traffic. I managed to
> >>get some coredumps and extracted the backtraces. Since everytime one of
> >>the boxes paniced I got different panic message and a different backtrace
> >>with a different subsystem involved I suspected broken hardware. I
> >>plugged a em(4) NIC into the PCI slot and wasn't able to reproduce the
> >>problem, in fact the boxes run rock solid for several days. Next I set
> >>up a Windows 7, installed the Attansic vendor driver and did another
> >>run. All went smooth, no crash for nearly 24 hours.
> >>
> >>My guess is kernel memory corruption by age(4), which would explain all
> >>the different backtraces and the different panic messages. This problem
> >>is reproducible in at least FreeBSD 7.4 and 8.2 and with TSO4 enabled
> >>and disabled. I'm willing to debug this, but I really don't know how. So
> >>any help or a pointer into the right direction would be appreciated.
> >>
> >
> >AFAIK this is the first report for possible memory corruption
> >triggered by age(4). I'm still not sure whether it's caused by
> >age(4) but you can disable RX checksum offloading and see whether
> >that makes any difference.
> >Since I have no longer access to the hardware it would be even
> >better if you can tell me which traffic pattern triggered the
> >issue.
> 
> Okay, I did a test run with RX checksum, TX checksum and both disabled. 
> In all three cases the crash occurs within about 20 minutes. I'm either
> not sure that age(4) is the problem but it has definedly something to do
> with the problem, since with another nic driver the same scenario is
> rock solid...
> 

OK.

> The workload: It's a NFS3 server (FreeBSDs non-experimental
> implementation), serving and receiving file with about 250 to 500
> megabytes at about 20mb/s. The clients are FreeBSD 7 and 8 systems and
> are mounting the shares via TCP. The connection is 1000mbit/s via a
> "dumb" gigabit switch.
> 

That's too broad to narrow down the issue. :-(
I'm not sure but your box seem to have more than 4GB memory. Could
you limit the available memory to 3GB via loader.conf and test it
again?


More information about the freebsd-net mailing list