[Bug 220076] [patch] [panic] [netgraph] repeatable kernel panic due to a race in ng_iface(4)

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Sat Jun 17 08:44:43 UTC 2017


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220076

            Bug ID: 220076
           Summary: [patch] [panic] [netgraph] repeatable kernel panic due
                    to a race in ng_iface(4)
           Product: Base System
           Version: 11.0-STABLE
          Hardware: Any
                OS: Any
            Status: New
          Keywords: patch
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs at FreeBSD.org
          Reporter: eugen at freebsd.org
          Keywords: patch

Created attachment 183566
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=183566&action=edit
protect ng_iface private data

I observe repeatable panics at netgraph level while doing stress test
for net/mpd5 daemon under stable/11 r317184. It connects, uses and disconnects
lots of ngXX interfaces and corresponding netgraph nodes and hooks.

Crashdump points to ng_iface node which private data - set of hooks - may be
modified in respond to userland request while another kernel thread sends data
over hook being disconnected. Here is a scenario:

1. mpd runs its BundNcpsLeave() procedure for an interface calling
NgSendMsg(csock, path, NGM_GENERIC_COOKIE, NGM_RMHOOK, &rm, sizeof(rm)) that
leads to libnetgraph's NgDeliverMsg() and sendto() system call for AF_NETGRAPH.

The kernel reponds with ng_findhook->ng_destroy_hook->NG_HOOK_UNREF
(_NG_HOOK_UNREF/ng_unref_hook)->NG_FREE_HOOK: free((hook), M_NETGRAPH_HOOK).

2. In parallel, userland process like ftpd sends some data over IPv4 socket to
corresponding interface being up and running. It may utilize hook being freed
same time by another kernel thread that leads to:

Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer     = 0x20:0xffffffff8097f249
stack pointer           = 0x28:0xfffffe0239542ec0
frame pointer           = 0x28:0xfffffe0239542f00
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 28999 (ftpd)
trap number             = 9
panic: general protection fault
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame 0xfffffe0239542840
kdb_backtrace() at kdb_backtrace+0x53/frame 0xfffffe0239542910
vpanic() at vpanic+0x249/frame 0xfffffe02395429e0
kproc_shutdown() at kproc_shutdown/frame 0xfffffe0239542a40
trap_fatal() at trap_fatal+0x60a/frame 0xfffffe0239542b70
trap() at trap+0x97c/frame 0xfffffe0239542dd0
trap_check() at trap_check+0x15/frame 0xfffffe0239542df0
calltrap() at calltrap+0x8/frame 0xfffffe0239542df0
--- trap 0x9, rip = 0xffffffff8097f249, rsp = 0xfffffe0239542ec0, rbp =
0xfffffe0239542f00 ---
ng_address_hook() at ng_address_hook+0x59/frame 0xfffffe0239542f00
ng_iface_send() at ng_iface_send+0x108/frame 0xfffffe0239542f90
ng_iface_output() at ng_iface_output+0x447/frame 0xfffffe0239543060
ip_output() at ip_output+0x1864/frame 0xfffffe0239543300
tcp_output() at tcp_output+0x2602/frame 0xfffffe02395436a0
tcp_disconnect() at tcp_disconnect+0x18e/frame 0xfffffe02395436e0
tcp_usr_disconnect() at tcp_usr_disconnect+0xe6/frame 0xfffffe0239543710
sodisconnect() at sodisconnect+0x62/frame 0xfffffe0239543740
soclose() at soclose+0x95/frame 0xfffffe02395437b0
soo_close() at soo_close+0x4d/frame 0xfffffe02395437e0
fo_close() at fo_close+0x31/frame 0xfffffe0239543810
_fdrop() at _fdrop+0x46/frame 0xfffffe0239543840
closef() at closef+0x2d7/frame 0xfffffe02395438f0
closefp() at closefp+0xde/frame 0xfffffe0239543940
kern_close() at kern_close+0xe7/frame 0xfffffe0239543990
sys_close() at sys_close+0x1f/frame 0xfffffe02395439b0
syscallenter() at syscallenter+0x4ff/frame 0xfffffe0239543a80
amd64_syscall() at amd64_syscall+0x2a/frame 0xfffffe0239543bb0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0239543bb0
--- syscall (6, FreeBSD ELF64, sys_close), rip = 0x801a4033a, rsp =
0x7fffffffd0a8, rbp = 0x7fffffffd0d0 ---
Uptime: 2h11m5s
Dumping 544 out of 8156 MB:..3%..12%..21%..33%..42%..53%..62%..71%..83%..92%

Reading symbols from /boot/modules/geom_mirror.ko...done.
Loaded symbols for /boot/modules/geom_mirror.ko
Reading symbols from /boot/modules/accf_http.ko...done.
Loaded symbols for /boot/modules/accf_http.ko
Reading symbols from /boot/modules/nvidia.ko...done.
Loaded symbols for /boot/modules/nvidia.ko
Reading symbols from /boot/modules/vboxdrv.ko...done.
Loaded symbols for /boot/modules/vboxdrv.ko
Reading symbols from /boot/modules/mmc.ko...done.
Loaded symbols for /boot/modules/mmc.ko
Reading symbols from /boot/modules/mmcsd.ko...done.
Loaded symbols for /boot/modules/mmcsd.ko
Reading symbols from /boot/modules/sdhci.ko...done.
Loaded symbols for /boot/modules/sdhci.ko
Reading symbols from /boot/modules/h_ertt.ko...done.
Loaded symbols for /boot/modules/h_ertt.ko
Reading symbols from /boot/modules/cc_chd.ko...done.
Loaded symbols for /boot/modules/cc_chd.ko
Reading symbols from /boot/modules/geom_sched.ko...done.
Loaded symbols for /boot/modules/geom_sched.ko
Reading symbols from /boot/modules/gsched_rr.ko...done.
Loaded symbols for /boot/modules/gsched_rr.ko
Reading symbols from /boot/modules/vboxnetflt.ko...done.
Loaded symbols for /boot/modules/vboxnetflt.ko
Reading symbols from /boot/modules/vboxnetadp.ko...done.
Loaded symbols for /boot/modules/vboxnetadp.ko
Reading symbols from /boot/modules/nullfs.ko...done.
Loaded symbols for /boot/modules/nullfs.ko
Reading symbols from /usr/local/modules/rtc.ko...done.
Loaded symbols for /usr/local/modules/rtc.ko
#0  doadump (textdump=1) at /data2/src/sys/kern/kern_shutdown.c:298
298             dumptid = curthread->td_tid;
(kgdb) bt
#0  doadump (textdump=1) at /data2/src/sys/kern/kern_shutdown.c:298
#1  0xffffffff807a0828 in kern_reboot (howto=260) at
/data2/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff807a125f in vpanic (fmt=0xffffffff80cf5311 "%s",
ap=0xfffffe0239542a20)
    at /data2/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff807a12d0 in panic (fmt=0xffffffff80cf5311 "%s") at
/data2/src/sys/kern/kern_shutdown.c:690
#4  0xffffffff80c06a0a in trap_fatal (frame=0xfffffe0239542e00, eva=0) at
/data2/src/sys/amd64/amd64/trap.c:801
#5  0xffffffff80c0604c in trap (frame=0xfffffe0239542e00) at
/data2/src/sys/amd64/amd64/trap.c:549
#6  0xffffffff80c07085 in trap_check (frame=0xfffffe0239542e00) at
/data2/src/sys/amd64/amd64/trap.c:602
#7  0xffffffff80bdeba3 in calltrap () at
/data2/src/sys/amd64/amd64/exception.S:236
#8  0xffffffff8097f249 in ng_address_hook (here=0x0, item=0xfffff8011ddd1f00,
hook=0xfffff801f8232300, retaddr=0)
    at /data2/src/sys/netgraph/ng_base.c:3586
#9  0xffffffff80986548 in ng_iface_send (ifp=0xfffff801f8ef2800,
m=0xfffff801f8526100, sa=2 '\002')
    at /data2/src/sys/netgraph/ng_iface.c:451
#10 0xffffffff80985c97 in ng_iface_output (ifp=0xfffff801f8ef2800,
m=0xfffff801f8526100, dst=0xfffff8011db98720, 
    ro=0xfffff8011db98700) at /data2/src/sys/netgraph/ng_iface.c:386
#11 0xffffffff809cae14 in ip_output (m=0xfffff801f8526100, opt=0x0,
ro=0xfffff8011db98700, flags=0, imo=0x0, 
    inp=0xfffff8011db98570) at /data2/src/sys/netinet/ip_output.c:655
#12 0xffffffff809e08d2 in tcp_output (tp=0xfffff801f8258410) at
/data2/src/sys/netinet/tcp_output.c:1446
#13 0xffffffff809f5f4e in tcp_disconnect (tp=0xfffff801f8258410) at
/data2/src/sys/netinet/tcp_usrreq.c:1946
#14 0xffffffff809f29a6 in tcp_usr_disconnect (so=0xfffff8011d8de360) at
/data2/src/sys/netinet/tcp_usrreq.c:674
#15 0xffffffff80884072 in sodisconnect (so=0xfffff8011d8de360) at
/data2/src/sys/kern/uipc_socket.c:1051
#16 0xffffffff808839b5 in soclose (so=0xfffff8011d8de360) at
/data2/src/sys/kern/uipc_socket.c:869
#17 0xffffffff8084d67d in soo_close (fp=0xfffff8011df23b40,
td=0xfffff801f8c5d000) at /data2/src/sys/kern/sys_socket.c:334
#18 0xffffffff8072bee1 in fo_close (fp=0xfffff8011df23b40,
td=0xfffff801f8c5d000) at file.h:346
#19 0xffffffff80726b86 in _fdrop (fp=0xfffff8011df23b40, td=0xfffff801f8c5d000)
at /data2/src/sys/kern/kern_descrip.c:2849
#20 0xffffffff8072b1f7 in closef (fp=0xfffff8011df23b40, td=0xfffff801f8c5d000)
at /data2/src/sys/kern/kern_descrip.c:2430
#21 0xffffffff8072768e in closefp (fdp=0xfffff80007104000, fd=6,
fp=0xfffff8011df23b40, td=0xfffff801f8c5d000, holdleaders=0)
    at /data2/src/sys/kern/kern_descrip.c:1191
#22 0xffffffff80728417 in kern_close (td=0xfffff801f8c5d000, fd=6) at
/data2/src/sys/kern/kern_descrip.c:1239
#23 0xffffffff8072831f in sys_close (td=0xfffff801f8c5d000,
uap=0xfffffe0239543b58)
    at /data2/src/sys/kern/kern_descrip.c:1218
#24 0xffffffff80c07b7f in syscallenter (td=0xfffff801f8c5d000,
sa=0xfffffe0239543b48) at subr_syscall.c:135
#25 0xffffffff80c0741a in amd64_syscall (td=0xfffff801f8c5d000, traced=0) at
/data2/src/sys/amd64/amd64/trap.c:902
#26 0xffffffff80bdee8b in Xfast_syscall () at
/data2/src/sys/amd64/amd64/exception.S:396
#27 0x0000000801a4033a in ?? ()
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal
(kgdb) frame 8
#8  0xffffffff8097f249 in ng_address_hook (here=0x0, item=0xfffff8011ddd1f00,
hook=0xfffff801f8232300, retaddr=0)
    at /data2/src/sys/netgraph/ng_base.c:3586
3586                NG_HOOK_NOT_VALID(peer = NG_HOOK_PEER(hook)) ||
(kgdb) l
3581             * that the peer is still connected (even if invalid,) we know
3582             * that the peer node is present, though maybe invalid.
3583             */
3584            TOPOLOGY_RLOCK();
3585            if ((hook == NULL) || NG_HOOK_NOT_VALID(hook) ||
3586                NG_HOOK_NOT_VALID(peer = NG_HOOK_PEER(hook)) ||
3587                NG_NODE_NOT_VALID(peernode = NG_PEER_NODE(hook))) {
3588                    NG_FREE_ITEM(item);
3589                    TRAP_ERROR();
3590                    TOPOLOGY_RUNLOCK();
(kgdb) p *hook
$1 = {
  hk_name = 0xfffff801f8232300
"ÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞP¯\003\201ÿÿÿÿÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­Þ"..., 
  hk_private = 0xdeadc0dedeadc0de, hk_flags = -559038242, hk_type = -559038242,
hk_peer = 0xdeadc0dedeadc0de, 
  hk_node = 0xdeadc0dedeadc0de, hk_hooks = {le_next = 0xdeadc0dedeadc0de,
le_prev = 0xdeadc0dedeadc0de}, 
  hk_rcvmsg = 0xdeadc0dedeadc0de, hk_rcvdata = 0xdeadc0dedeadc0de, hk_refs =
-559038242}
(kgdb) 

Attached patch introduces per-node rwlock for ng_iface to protect usage of its
private data while it is being modified. Without the patch, my stress test for
mpd procudes this panic in short time. With patch applied, it was running over
11 hours non-stop and no panics.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list