kern/106400: fatal trap 12 at restart of PF with ALTQ if ng0 device has detached

Wed Dec 6 06:27:46 PST 2006

On 12/06/06 14:37, Max Laier wrote:
> On Wednesday 06 December 2006 14:20, Volker wrote:
>> The following reply was made to PR kern/106400; it has been noted by
>> GNATS.
>>
>> From: Volker <volker at vwsoft.com>
>> To: bug-followup at FreeBSD.org,  bst2006 at dva.dyndns.org
>> Cc:
>> Subject: Re: kern/106400: fatal trap 12 at restart of PF with ALTQ if
>> ng0 device has detached
>> Date: Wed, 06 Dec 2006 14:16:42 +0100
>>
>>  First I would suggest to use ALTQ w/ mpd not on ng0 but on the real
>>  physical interface (for example fxp0, xl0) which is being used by
>>  netgraph/mpd.
>>
>>  On the other side I also do have trouble using ALTQ with mpd but I'm
>>  using mpd for a 3G connection (based on a tty device, not a NIC).
>>
>>  Avoiding ALTQ rules in pf.conf for the ng0 interface (not using ALTQ
>>  on ng0) doesn't produce a fatal trap 12. So disabling ALTQ in your
>>  kernel is not the only workaround. You may still use ALTQ on your
>>  internal NIC without a trap.
>>
>>  Unlike your experience, I always do experience a kernel trap when
>>  reloading pf rules w/ ALTQ on ng0 (whether or not pf rules are
>>  reloaded by a script or manually).
>>
>>  This also occours while the ng0 interface is still there and from my
>>  experience it's not related to a reload of mpd.
> 
> Can you provide a trace for this panic?  I have a good understanding of 
> the issue in the PR, but your problem seems to be quite different if the 
> ng0 device really doesn't go away meanwhile.  More details would be 
> required.
> 

Max,

sure, I can do that but please stay patient for some days. I need to
setup debugging env, serial cable and get the right time to be
willing to crash my server machine... ;)

Just as a pre-information, here's the dmesg output of a crash
occured at 2006-11-07 (taken from periodic security message):

kernel trap 12 with interrupts disabled
>
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address	= 0x2c352f30
> fault code		= supervisor read, page not present
> instruction pointer	= 0x20:0xc057ae05
> stack pointer	        = 0x28:0xdaaa595c
> frame pointer	        = 0x28:0xdaaa5968
> code segment		= base 0x0, limit 0xfffff, type 0x1b
> 			= DPL 0, pres 1, def32 1, gran 1
> processor eflags	= resume, IOPL = 0
> current process		= 24730 (pfctl)
> trap number		= 12
> panic: page fault
> cpuid = 0
> KDB: stack backtrace:
> kdb_backtrace(100,c3995c00,28,daaa591c,c,...) at kdb_backtrace+0x29
> panic(c07573b4,c0788e59,0,fffff,c09b,...) at panic+0x113
> trap_fatal(daaa591c,2c352f30) at trap_fatal+0x2d7
> trap(8,28,28,c3995c00,2c352e30,...) at trap+0x10e
> calltrap() at calltrap+0x5
> --- trap 0xc, eip = 0xc057ae05, esp = 0xdaaa595c, ebp = 0xdaaa5968 ---
> _mtx_lock_sleep(c346850c,c3995c00,0,0,0) at _mtx_lock_sleep+0xa5
> rmc_delete_class(c3c2cc04,c3723c00,c3723c00,a,daaa59c8,...) at
rmc_delete_class+0x5a
> cbq_class_destroy(c3c2c800,c3723c00,1,c3c2c800,0,...) at
cbq_class_destroy+0x18
> cbq_clear_interface(c3c2c800) at cbq_clear_interface+0x37
> cbq_remove_altq(c3729600) at cbq_remove_altq+0x20
> altq_remove(c3729600) at altq_remove+0x3d
> pf_commit_altq(4,cd858940,cd858940,c35a9c70,daaa5a30,...) at
pf_commit_altq+0x10a
> pfioctl(c3455300,c00c4452,c358b770,3,c3995c00,...) at pfioctl+0x3698
> devfs_ioctl_f(c35b65a0,c00c4452,c358b770,c3bd8e80,c3995c00) at
devfs_ioctl_f+0xb3
> ioctl(c3995c00,daaa5d04) at ioctl+0x449
> syscall(3b,3b,3b,bfbfdd6c,0,...) at syscall+0x2cd
> Xint0x80_syscall() at Xint0x80_syscall+0x1f
> --- syscall (54, FreeBSD ELF32, ioctl), eip = 0x28193b0f, esp =
0xbfbfdd4c, ebp = 0xbfbfdd78 ---

I'll provide you complete details and probably more debugging infos
next week.

Greetings,

Volker