Kernel memory corruption(?) with age(4)
YongHyeon PYUN
pyunyh at gmail.com
Wed Mar 30 17:33:01 UTC 2011
On Wed, Mar 30, 2011 at 04:22:23PM +0200, Yamagi Burmeister wrote:
> Hi,
> I recently got four about two years old Asus M3A-H/HDMI mainboards with
> an integrated Attansic L2 ethernet controller. This NIC is supported by
> age(4) and recognized by freebsd:
>
> ----
>
> age0: <Attansic Technology Corp, L1 Gigabit Ethernet>
> mem 0xfeac0000-0xfeafffff irq 18 at device 0.0 on pci2
> age0: 1280 Tx FIFO, 2364 Rx FIFO
> age0: Using 1 MSI messages.
> age0: 4GB boundary crossed, switching to 32bit DMA addressing mode.
> miibus0: <MII bus> on age0
> atphy0: <Atheros F1 10/100/1000 PHY> PHY 0 on miibus0
> atphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX,
> 1000baseT-FDX-master, auto
> age0: Ethernet address: 00:23:54:31:a0:12
> age0: [FILTER]
>
> ----
>
> age0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
> options=c319b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,
> WOL_MCAST,WOL_MAGIC,VLAN_HWTSO,LINKSTATE>
> ether 00:23:54:31:a0:12
> inet6 fe80::223:54ff:fe31:a012%age0 prefixlen 64 scopeid 0x1
> nd6 options=3<PERFORMNUD,ACCEPT_RTADV>
> media: Ethernet autoselect (none)
> status: no carrier
>
> ----
>
> All for boxes are unstable if the Attansic NIC is in use, no one of them
> survived more than 60 minutes of ~20mb/s network traffic. I managed to
> get some coredumps and extracted the backtraces. Since everytime one of
> the boxes paniced I got different panic message and a different backtrace
> with a different subsystem involved I suspected broken hardware. I
> plugged a em(4) NIC into the PCI slot and wasn't able to reproduce the
> problem, in fact the boxes run rock solid for several days. Next I set
> up a Windows 7, installed the Attansic vendor driver and did another
> run. All went smooth, no crash for nearly 24 hours.
>
> My guess is kernel memory corruption by age(4), which would explain all
> the different backtraces and the different panic messages. This problem
> is reproducible in at least FreeBSD 7.4 and 8.2 and with TSO4 enabled
> and disabled. I'm willing to debug this, but I really don't know how. So
> any help or a pointer into the right direction would be appreciated.
>
AFAIK this is the first report for possible memory corruption
triggered by age(4). I'm still not sure whether it's caused by
age(4) but you can disable RX checksum offloading and see whether
that makes any difference.
Since I have no longer access to the hardware it would be even
better if you can tell me which traffic pattern triggered the
issue.
More information about the freebsd-net
mailing list