if_sge related panics

Pyun YongHyeon pyunyh at gmail.com
Fri Jun 4 00:35:28 UTC 2010


On Thu, Jun 03, 2010 at 09:29:20AM +0300, Nikolay Denev wrote:
> On May 24, 2010, at 8:12 PM, Pyun YongHyeon wrote:
> 
> > On Mon, May 24, 2010 at 09:48:33AM -0400, John Baldwin wrote:
> >> On Monday 24 May 2010 6:35:01 am Nikolay Denev wrote:
> >>> On May 24, 2010, at 8:57 AM, Nikolay Denev wrote:
> >>> 
> >>>> Hi,
> >>>> 
> >>>> Recently I started to experience a if_sge(4) related panic.
> >>>> It happens almost every time I try to download a torrent file for example.
> >>>> Copying of large files over NFS seem not to trigger it, but I haven't tested extensively.
> >>>> 
> >>>> Here is the panic message :
> >>>> 
> >>>> Fatal trap 12: page fault while in kernel mode
> >>>> cpuid = 0; apic id = 00
> >>>> fault virtual address		= 0x8
> >>>> fault code				= supervisor write data, page not present
> >>>> instruction pointer		= 0x20:0xffffffff80230413
> >>>> stack pointer				= 0x28:0xffffff80001e9280
> >>>> frame pointer			= 0x28:0xffffff80001e9510
> >>>> code segment			= base 0x0, limit 0xfffff, type 0x1b
> >>>> 						= DPL 0, pres 1, long 1, def32 0, gran 1
> >>>> processor eflags			= interrupt enabled, resume, IOPL = 0
> >>>> current process			= 12 (irq19: sge0)
> >>>> trap number				= 12
> >>>> panic: page fault
> >>>> cpuid = 0
> >>>> Uptime: 1d20h56m20s
> >>>> Cannot dump. Device not defined or unavailable
> >>>> Automatic reboot in 15 seconds - press a key on the console to abort
> >>>> Sleeping thread (tid 100039, pid 12) owns a non-sleepable lock
> >>>> 
> >>>> My swap is on a zvol, so I don't have dump. I'll try to attach a disk on the eSATA port and dump there if needed.
> >>> 
> >>> Here is some info from the crashdump :
> >>> 
> >>> (kgdb) #0  doadump () at pcpu.h:223
> >>> #1  0xffffffff802fb149 in boot (howto=260)
> >>>    at /usr/src/sys/kern/kern_shutdown.c:416
> >>> #2  0xffffffff802fb57c in panic (fmt=0xffffffff8055d564 "%s")
> >>>    at /usr/src/sys/kern/kern_shutdown.c:590
> >>> #3  0xffffffff805055b8 in trap_fatal (frame=0xffffff000288a3e0, eva=Variable "eva" is not available.
> >>> )
> >>>    at /usr/src/sys/amd64/amd64/trap.c:777
> >>> #4  0xffffffff805059dc in trap_pfault (frame=0xffffff80001e91d0, usermode=0)
> >>>    at /usr/src/sys/amd64/amd64/trap.c:693
> >>> #5  0xffffffff805061c5 in trap (frame=0xffffff80001e91d0)
> >>>    at /usr/src/sys/amd64/amd64/trap.c:451
> >>> #6  0xffffffff804eb977 in calltrap ()
> >>>    at /usr/src/sys/amd64/amd64/exception.S:223
> >>> #7  0xffffffff80230413 in sge_start_locked (ifp=0xffffff000270d800)
> >>>    at /usr/src/sys/dev/sge/if_sge.c:1591
> >> 
> >> Try this.  sge_encap() can sometimes return an error with m_head set to NULL:
> >> 
> > 
> > Thanks John. Committed in r208512.
> > 
> >> Index: if_sge.c
> >> ===================================================================
> >> --- if_sge.c	(revision 208375)
> >> +++ if_sge.c	(working copy)
> >> @@ -1588,7 +1588,8 @@
> >> 		if (m_head == NULL)
> >> 			break;
> >> 		if (sge_encap(sc, &m_head)) {
> >> -			IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
> >> +			if (m_head != NULL)
> >> +				IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
> >> 			ifp->if_drv_flags |= IFF_DRV_OACTIVE;
> >> 			break;
> >> 		}
> >> 
> >> -- 
> >> John Baldwin
> 
> After the patch I experienced several network outages (ping reporting "no buffer space available")
> that were resolved by ifconfig down/up of the sge(4) interface.
> 

Because I don't have access to sge(4) controllers I never had chance
to run it. Does ping(8) generates "no buffer space available" when
the system is in idle state? Could you show me more information on
how you checked network outages?

> I can see that most of the other drivers that handle XXX_encap() returning m_head pointing NULL, break when this condition

Yes, most drivers written/touched by me behaves like that.

> is hit: i.e. :
> 
> Index: if_sge.c
> ===================================================================
> --- if_sge.c	(revision 208375)
> +++ if_sge.c	(working copy)
> @@ -1588,7 +1588,8 @@
> 		if (m_head == NULL)
> 			break;
> 		if (sge_encap(sc, &m_head)) {
> -			IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
> +			if (m_head == NULL)
> +				break;
> 			IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
> 			ifp->if_drv_flags |= IFF_DRV_OACTIVE;
> 			break;
> 		}
> 
> But here in sge(4) we always set IFF_DRV_OACTIVE.
> Do you think this can be the source of the problem ?
> 

More correct way to set IFF_DRV_OACTIVE would be check the number
of queued frames or just exit the transmit loop. If there is no
queued frames, IFF_DRV_OACTIVE would never be cleared which in turn
cause ENOBUFS in ping(8). I think your change looks more reasonable
to me. Do you still see the same issue with the change you suggested?


More information about the freebsd-stable mailing list