kern/165630: [ndis][panic][patch] IRQL_NOT_GREATER_THAN

Fri Mar 2 15:30:07 UTC 2012

>Number:         165630
>Category:       kern
>Synopsis:       [ndis][panic][patch] IRQL_NOT_GREATER_THAN
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Mar 02 15:30:06 UTC 2012
>Closed-Date:
>Last-Modified:
>Originator:     Vladislav Movchan
>Release:        FreeBSD 10.0-CURRENT
>Organization:
>Environment:
FreeBSD starlight 10.0-CURRENT FreeBSD 10.0-CURRENT #0 r232379: Fri Mar 2 13:24:25 EET 2012 root at starlight:/usr/obj/usr/src/sys/Mephistopheles amd64
>Description:
There is a race condition inside of sys/compat/ndis/subr_hal.c:KfRaiseIrql() function which could lead to "IRQL_NOT_GREATER_THAN" panic.

Race could happen if after execution of this line:
oldirql = KeGetCurrentIrql();

kthread got preempted and later appeared on another CPU core which has another "irq level".
In this case "oldirql != DISPATCH_LEVEL" check will have no much sense as on current CPU core "irq level" could be different. As result we will skip sched_pin + mtx_lock lines when it was necessary execute them, or execute then when it necessary to skip.


To fix this race we pin to current CPU core before obtaining "irq level", then we made "oldirql != DISPATCH_LEVEL" check and later we "unpin" (if we find that previous sched_pin() was not actually necessary).


Panic message:
panic: IRQL_NOT_GREATER_THAN
cpuid = 3
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
panic() at panic+0x1cd
KfLowerIrql() at KfLowerIrql+0x62
ntoskrnl_workitem_thread() at ntoskrnl_workitem_thread+0xab
fork_exit() at fork_exit+0x189
fork_trampoline() at fork_trampoline+0xe

--- trap 0, rip = 0, rsp = 0xffffff80d111acf0, rbp = 0 ---
KDB: enter: panic


Backtrace:
#0  doadump (textdump=-787372224) at /usr/src/sys/kern/kern_shutdown.c:268
268             if (textdump && textdump_pending) {
(kgdb) #0  doadump (textdump=-787372224) at /usr/src/sys/kern/kern_shutdown.c:268
#1  0xffffffff802de36c in db_fncall (dummy1=Variable "dummy1" is not available.
)
    at /usr/src/sys/ddb/db_command.c:573
#2  0xffffffff802de6a1 in db_command (last_cmdp=0xffffffff80dcd560, cmd_table=Variable "cmd_table" is not available.

) at /usr/src/sys/ddb/db_command.c:449
#3  0xffffffff802de8f0 in db_command_loop ()
    at /usr/src/sys/ddb/db_command.c:502
#4  0xffffffff802e0ad4 in db_trap (type=Variable "type" is not available.
) at /usr/src/sys/ddb/db_main.c:229
#5  0xffffffff806d3bb1 in kdb_trap (type=3, code=0, tf=0xffffff80d111a970)
    at /usr/src/sys/kern/subr_kdb.c:629
#6  0xffffffff80942f68 in trap (frame=0xffffff80d111a970)
    at /usr/src/sys/amd64/amd64/trap.c:591
#7  0xffffffff8092c1cf in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:228
#8  0xffffffff806d36ab in kdb_enter (why=0xffffffff80a3fb16 "panic",
    msg=0x80 <Address 0x80 out of bounds>) at cpufunc.h:63
#9  0xffffffff8069aea6 in panic (fmt=Variable "fmt" is not available.
)
    at /usr/src/sys/kern/kern_shutdown.c:633
#10 0xffffffff840d6ff2 in KfLowerIrql (oldirql=Variable "oldirql" is not available.
)
    at /usr/src/sys/modules/ndis/../../compat/ndis/subr_hal.c:417
#11 0xffffffff840dba5b in ntoskrnl_workitem_thread (arg=Variable "arg" is not available.
)
    at /usr/src/sys/modules/ndis/../../compat/ndis/subr_ntoskrnl.c:2766
#12 0xffffffff8066b039 in fork_exit (
    callout=0xffffffff840db9b0 <ntoskrnl_workitem_thread>,
    arg=0xfffffe00206a9cc0, frame=0xffffff80d111ac40)
    at /usr/src/sys/kern/kern_fork.c:992
#13 0xffffffff8092c6fe in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:602
#14 0x0000000000000000 in ?? ()
#15 0x0000000000000000 in ?? ()
#16 0x0000000000000001 in ?? ()
#17 0x0000000000000000 in ?? ()
#18 0x0000000000000000 in ?? ()
#19 0x0000000000000000 in ?? ()
#20 0x0000000000000000 in ?? ()
#21 0x0000000000000000 in ?? ()
#22 0x0000000000000000 in ?? ()
#23 0x0000000000000000 in ?? ()
#24 0x0000000000000000 in ?? ()
#25 0x0000000000000000 in ?? ()
#26 0x0000000000000000 in ?? ()
#27 0x0000000000000000 in ?? ()
#28 0x0000000000000000 in ?? ()
#29 0x0000000000000000 in ?? ()
#30 0x0000000000000000 in ?? ()
#31 0x0000000000000000 in ?? ()
#32 0x0000000000000000 in ?? ()
#33 0x0000000000000000 in ?? ()
#34 0x0000000000000000 in ?? ()
#35 0x0000000000000000 in ?? ()
#36 0x0000000000000000 in ?? ()
#37 0x0000000000000000 in ?? ()
#38 0xffffffff8240f200 in tdq_cpu ()
#39 0x0000000000000000 in ?? ()
#40 0xffffffff8240f1c0 in affinity ()
#41 0xfffffe0020971900 in ?? ()
#42 0xffffff80d111a980 in ?? ()
#43 0xffffff80d111a928 in ?? ()
#44 0xfffffe00082bf900 in ?? ()
#45 0xffffffff806c64ac in sched_switch (td=0xfffffe00206a9cc0,
    newtd=0xffffffff840db9b0, flags=Variable "flags" is not available.
) at /usr/src/sys/kern/sched_ule.c:1890
Previous frame inner to this frame (corrupt stack?)
(kgdb)


>How-To-Repeat:
There are no special conditions necessary to reproduce this panic - just load ndis interface with traffic and wait for the race.
I've used benchmarks/iperf running 10 parallel threads ("-P 10" flag) with tcp window set to 1Mb ("-w 1m" flag). Machine with ndis interface was a "client" of iperf (so it was mostly sending). 

Adding bpf(4) consumers (trafshow or tcpdump) allows to trigger this race/panic faster.
Using faster NIC (1Gbps NIC instead of wifi) allows to trigger this race/panic faster.
>Fix:
Attached patch fixed this problem for me.

Patch attached with submission follows:

Index: /usr/src/sys/compat/ndis/subr_hal.c
===================================================================
--- /usr/src/sys/compat/ndis/subr_hal.c	(revision 232379)
+++ /usr/src/sys/compat/ndis/subr_hal.c	(working copy)
@@ -392,16 +392,17 @@
 {
 	uint8_t			oldirql;
 
+	sched_pin();
 	oldirql = KeGetCurrentIrql();
 
 	/* I am so going to hell for this. */
 	if (oldirql > irql)
 		panic("IRQL_NOT_LESS_THAN");
 
-	if (oldirql != DISPATCH_LEVEL) {
-		sched_pin();
+	if (oldirql != DISPATCH_LEVEL)
 		mtx_lock(&disp_lock[curthread->td_oncpu]);
-	}
+	else
+		sched_unpin();
 /*printf("RAISE IRQL: %d %d\n", irql, oldirql);*/
 
 	return (oldirql);


>Release-Note:
>Audit-Trail:
>Unformatted: