Crashes on X7SPE-HF with em
Jeremy Chadwick
freebsd at jdc.parodius.com
Mon Aug 30 11:08:47 UTC 2010
Bcc:
Subject: Re: igb related(?) panics on 7.3-STABLE
Reply-To:
In-Reply-To: <20100830094631.GD12467 at core.byshenk.net>
On Mon, Aug 30, 2010 at 11:46:31AM +0200, Greg Byshenk wrote:
> On Sun, Aug 29, 2010 at 08:16:59PM +0200, Greg Byshenk wrote:
>
> > I've begun seeing problems on a machine running FreeBSD-7.3-STABLE, 64-bit,
> > with two igb nics in use. Previously the machine was fine, running earlier
> > versions of 7-STABLE, although the load on the network has increased due
> > to additional machines being added to the network (the machine functions
> > as a fileserver, serving files to compute machines via NFS(v3)).
> >
> > Any advice is much appreciated. System info is below.
>
>
> Followup with more information. The machine just panic'ed again, with
> a lot of load on the network.
>
> Output from the 'systat' that was running at the time:
>
> 3 users Load 54.47 42.35 24.25 Aug 30 11:17
>
> Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER
> Tot Share Tot Share Free in out in out
> Act 46232 5504 868140 10548 943324 count
> All 456484 7852 1074772k 27740 pages
> Proc: Interrupts
> r p d s w Csw Trp Sys Int Sof Flt cow 54220 total
> 1 170 392k 8 278 22k 195 1 zfod sio0 irq4
> ozfod fdc0 irq6
> 70.4%Sys 3.1%Intr 0.0%User 0.0%Nice 26.5%Idle %ozfod 27 twa0 uhci0
> | | | | | | | | | | | daefr 2001 cpu0: time
> ===================================++ prcfr igb0 256
> 9938 dtbuf 1247 totfr igb0 257
> Namei Name-cache Dir-cache 100000 desvn react igb0 258
> Calls hits % hits % 34443 numvn 1 pdwak igb0 259
> 24996 frevn 112852 pdpgs igb0 262
> intrn igb0 263
> Disks da0 da1 pass0 pass1 2570672 wire igb0 264
> KB/t 0.00 12.23 0.00 0.00 46760 act igb0 265
> tps 0 26 0 0 14706896 inact 19449 igb1 266
> MB/s 0.00 0.31 0.00 0.00 0 769796 26585
> 0 21 0 0 173528
>
>
> -greg
>
>
>
> > Machine:
> > =======
> >
> > FreeBSD server.example.com 7.3-STABLE FreeBSD 7.3-STABLE #36: Wed Aug 25 11:01:07 CEST 2010 root at server.example.com:/usr/obj/usr/src/sys/KERNEL amd64
> >
> > Kernel was csup'd earlier in the day on 25 August, immediately prior to
> > the build.
> >
> >
> > Panic:
> > ======
> >
> > Fatal trap 9: general protection fault while in kernel mode
> > cpuid = 2; apic id = 02
> > instruction pointer = 0x8:0xffffffff8052f40c
> > stack pointer = 0x10:0xffffff82056819d0
> > frame pointer = 0x10:0xffffff82056819f0
> > code segment = base 0x0, limit 0xfffff, type 0x1b
> > = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags = interrupt enabled, resume, IOPL = 0
> > current process = 65 (igb1 que)
> > trap number = 9
> > panic: general protection fault
> > cpuid = 2
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> > panic() at panic+0x182
> > trap_fatal() at trap_fatal+0x294
> > trap() at trap+0x106
> > calltrap() at calltrap+0x8
> > --- trap 0x9, rip = 0xffffffff8052f40c, rsp = 0xffffff82056819d0, rbp = 0xffffff82056819f0 --- m_tag_delete_chain() at m_tag_delete_chain+0x1c
> > uma_zfree_arg() at uma_zfree_arg+0x41
> > m_freem() at m_freem+0x54
> > ether_demux() at ether_demux+0x85
> > ether_input() at ether_input+0x1bb
> > igb_rxeof() at igb_rxeof+0x29d
> > igb_handle_que() at igb_handle_que+0x9a
> > taskqueue_run() at taskqueue_run+0xac
> > taskqueue_thread_loop() at taskqueue_thread_loop+0x46
> > fork_exit() at fork_exit+0x122
> > fork_trampoline() at fork_trampoline+0xe
> > --- trap 0, rip = 0, rsp = 0xffffff8205681d30, rbp = 0 ---
> > Uptime: 11h57m6s
> > Physical memory: 18411 MB
> > Dumping 3770 MB:
> >
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 0; apic id = 00
> > fault virtual address = 0x8000000000
> > fault code = supervisor write data, page not present
> > instruction pointer = 0x8:0xffffffff80188b5f
> > stack pointer = 0x10:0xffffff82056811f0
> > frame pointer = 0x10:0xffffff82056812f0
> > code segment = base 0x0, limit 0xfffff, type 0x1b
> > = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags = interrupt enabled, resume, IOPL = 0
> > current process = 65 (igb1 que)
> > trap number = 12
> >
> >
> > pciconf:
> > =======
> >
> > igb0 at pci0:10:0:0: class=0x020000 card=0x10c915d9 chip=0x10c98086 rev=0x01 hdr=0x00
> > vendor = 'Intel Corporation'
> > class = network
> > subclass = ethernet
> > igb1 at pci0:10:0:1: class=0x020000 card=0x10c915d9 chip=0x10c98086 rev=0x01 hdr=0x00
> > vendor = 'Intel Corporation'
> > class = network
> > subclass = ethernet
> >
> >
> > dmesg:
> > =====
> >
> > igb0: <Intel(R) PRO/1000 Network Connection version - 1.9.5> port 0xe880-0xe89f mem 0xfbe60000-0xfbe
> > 7ffff,0xfbe40000-0xfbe5ffff,0xfbeb8000-0xfbebbfff irq 16 at device 0.0 on pci10
> > igb0: Using MSIX interrupts with 10 vectors
> > igb0: [ITHREAD]
> > igb0: [ITHREAD]
> > igb0: [ITHREAD]
> > igb0: [ITHREAD]
> > igb0: [ITHREAD]
> > igb0: [ITHREAD]
> > igb0: [ITHREAD]
> > igb0: [ITHREAD]
> > igb0: [ITHREAD]
> > igb0: [ITHREAD]
> > igb0: Ethernet address: 00:30:48:ca:cd:72
> > igb1: <Intel(R) PRO/1000 Network Connection version - 1.9.5> port 0xec00-0xec1f mem 0xfbee0000-0xfbe
> > fffff,0xfbec0000-0xfbedffff,0xfbebc000-0xfbebffff irq 17 at device 0.1 on pci10
> > igb1: Using MSIX interrupts with 10 vectors
> > igb1: [ITHREAD]
> > igb1: [ITHREAD]
> > igb1: [ITHREAD]
> > igb1: [ITHREAD]
> > igb1: [ITHREAD]
> > igb1: [ITHREAD]
> > igb1: [ITHREAD]
> > igb1: [ITHREAD]
> > igb1: [ITHREAD]
> > igb1: [ITHREAD]
> > igb1: Ethernet address: 00:30:48:ca:cd:73
Adding Jack Vogel of Intel and Yong-Hyeon PYUN to the mix...
I don't know if this is possible for you to do, but do you see the same
problem when running 8.1-STABLE? I know there has been a lot of
positive work on igb(4) in RELENG_8, but not too many of the fixes and
improvements are backported to RELENG_7.
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/e1000/if_igb.c
Be sure to check out Revision 1.54 there (which is for HEAD/CURRENT,
but I'm not sure if it's been backported/incorporated in some other
way).
Otherwise, as a test/workaround you might try disabling MSI-X support
entirely to see if there's any improvement. This could degrade system
performance a bit (under heavy interrupt load). In /boot/loader.conf,
set hw.pci.enable_msix="0" and reboot. If there's no improvement, be
sure to remove this.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |
More information about the freebsd-stable
mailing list