Panic With Large Network Copy

Scott Willson scott at butlerpress.com
Wed May 30 14:27:29 UTC 2007


On May 29, 2007, at 4:26 PM, Kris Kennaway wrote:

> On Tue, May 29, 2007 at 03:36:49PM -0700, Scott Willson wrote:
>> I am seeing hard (often no core dump) crashes on a new AMD64 box
>> running 6.2 RELEASE. When I try to rsync 10+ GB of backup files to
>> the new box, I can reliably crash it after about 20 minutes; often
>> quicker if I do something else intensive at the same time, like
>> compile MySQL. Here are the box specs:
>> ASUS M2NPV-VM motherboard
>> AMD A64 3800+ 2.4G CPU
>> 2 x 1 GB SuperTalent DDR2 667 RAM
>> 2 x 500G Samsung SATA2 drives
>> MATSHITADVD-ROM SR-8585 DVD drive (ancient)
>>
>> Most times, I don't even get a core dump. Here's one I did get:
>> panic: double fault
>> Uptime: 20m26s
>> Dumping 2014 MB (2 chunks)
>>   chunk 0: 1MB (159 pages) ... ok
>>   chunk 1: 2014MB (515552 pages) 1998 1982 1966 1950 1934 1918 1902
>> 1886 1870 1854 1838 1822 1806 1790 1774 1758 1742 1726 1710 1694 1678
>> 1662 1646 1630 1614 1598 1582 1566 1550 1534 1518 1502 1486 1470 1454
>> 1438 1422 1406 1390 1374 1358 1342 1326 1310 1294 1278 1262 1246 1230
>> 1214 1198 1182 1166 1150 1134 1118 1102 1086 1070 1054 1038 1022 1006
>> 990 974 958 942 926 910 894 878 862 846 830 814 798 782 766 750 734
>> 718 702 686 670 654 638 622 606 590 574 558 542 526 510 494 478 462
>> 446 430 414 398 382 366 350 334 318 302 286 270 254 238 222 206 190
>> 174 158 142 126 110 94 78 62 46 30 14
>>
>> #0  doadump () at pcpu.h:172
>> 172             __asm __volatile("movq %%gs:0,%0" : "=r" (td));
>> (kgdb) backtrace
>> #0  doadump () at pcpu.h:172
>> #1  0x0000000000000004 in ?? ()
>> #2  0xffffffff803f6093 in boot (howto=260) at /usr/src/sys/kern/
>> kern_shutdown.c:409
>> #3  0xffffffff803f6696 in panic (fmt=0xffffff0079a08be0 "X??y") at /
>> usr/src/sys/kern/kern_shutdown.c:565
>> #4  0xffffffff80610e70 in dblfault_handler () at /usr/src/sys/amd64/
>> amd64/trap.c:680
>> #5  0xffffffff805fe2f2 in Xdblfault () at /usr/src/sys/amd64/amd64/
>> exception.S:192
>> #6  0xffffffff80439844 in m_tag_delete_chain (m=0x0, t=0x0) at /usr/
>> src/sys/kern/uipc_mbuf2.c:346
>> #7  0xffffffff803eac0d in mb_dtor_mbuf (mem=0x0, size=0, arg=0x0)  
>> at /
>> usr/src/sys/kern/kern_mbuf.c:338
>> #8  0xffffffff80592a24 in uma_zfree_arg (zone=0x0, item=0x0,
>> udata=0x0) at /usr/src/sys/vm/uma_core.c:2270
>> #9  0xffffffff804371f0 in m_freem (mb=0x0) at uma.h:303
>> #10 0xffffffff80634125 in nve_ospackettx (ctx=0xffffff00798aac00,
>> id=0xffffffffb19ea6d0, success=0) at /usr/src/sys/dev/nve/if_nve.c: 
>> 1551
>
> This looks like a nve driver bug to me.  You may wish to try the  
> nfe driver.
>
> Kris

Thanks for the suggestion, Kris.

I compiled a new kernel without nve, compiled nfe-20070512.tar.gz  
with the e1000phy.patch, and I enabled device polling:
e1000phy0: <Marvell 88E1116 Gigabit PHY> on miibus0
e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX,  
1000baseTX-FDX, auto
nfe0: Ethernet address: 00:1a:92:cb:b2:eb
nfe0: [FAST]

No more panics, but I see a lot of error messages under load:
May 29 20:25:17 brooklyn kernel: nfe0: tx v2 error 0x6204<UNDERFLOW>
May 29 20:28:15 brooklyn kernel: nfe0: watchdog timeout (missed Tx  
interrupts) -- recovering

The only odd thing about my current setup is that the server is  
sharing a old hub with other old hardware, and it looks like I've  
just got 10baseT:
ifconfig nfe0
nfe0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
         options=8<VLAN_MTU>
         inet 192.168.1.154 netmask 0xffffff00 broadcast 192.168.1.255
         ether 00:1a:92:cb:b2:eb
         media: Ethernet autoselect (10baseT/UTP <half-duplex>)
         status: active

For now, I've installed an old spare Ethernet card, and I see no  
errors, so I'm going to roll with that for now. I'm also going to  
followup with the nfe driver's maintainer in case he's interested.

Scott


More information about the freebsd-questions mailing list