kern/102335: [DEVFS] sx vnode deadlock

Alex Unleashed unledev at gmail.com
Mon Aug 21 01:20:29 UTC 2006


>Number:         102335
>Category:       kern
>Synopsis:       [DEVFS] sx vnode deadlock
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Aug 21 01:20:14 GMT 2006
>Closed-Date:
>Last-Modified:
>Originator:     Alex Unleashed
>Release:        6.1-RELEASE
>Organization:
>Environment:
FreeBSD merovingio.kicks-ass.org 6.1-RELEASE FreeBSD 6.1-RELEASE #0: Sun May  7 04:32:43 UTC 2006     root at opus.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  i386
>Description:
I've been able to reproduce both in 6.1-RELEASE (including -p3) and 6-STABLE (snapshot from August 16th 2006 ~01:00 GMT) a deadlock in devfs code which leaves the system unable to access the disk. I've come up with some interesting debugging info, and it looks to me like there are vnode problems while a sx lock is being held.

My take at it is that the deadlock occurs when a process gets a lock on a vnode (tagged "devfs") and another process xlocks an sx lock ("devfsmount"). For some reason the one holding the sx lock wants to get the lock on the vnode through devfs_allocv(), and the other process wants to get the sx lock through devfs_lookup(). From this point on, pretty much anything wanting to touch the filesystem waits forever on devfs_root() for another vnode flagged as VV_ROOT and locked by the process holding the sx lock.

Patching the devfs code with fixes from -CURRENT didn't work out.

Here's some info from a DDB session after the deadlock had just occurred:

db> ps
  pid   proc     uid  ppid  pgrp  flag   stat  wmesg    wchan  cmd
17072 c4145624   0  17070  1071 0004002 [SLPQ piperd     0xc39904c8][SLP] gawk-3.1.5
17071 c4149c48   0  17070  1071 0000002 [SLPQ devfs      0xc41435c8][SLP] sh
17070 c4145830   0  17068  1071 0000002 [SLPQ wait       0xc4145830][SLP] sh
17069 c414920c   0  17067  1071 0004002 [SLPQ devfsmount 0xc396d540][SLP] xgcc
<<snip>>

db> trace 17069
Tracing pid 17069 tid 100071 td 0xc4146600
sched_switch(c4146600,0,1,c09545dc,c4146600) at sched_switch+0x2ea
mi_switch(1,0,c0955f54,1ba,c091b594) at mi_switch+0x425
sleepq_switch(c396d540,0,c0955f54,20c,dad529f0) at sleepq_switch+0x110
sleepq_wait(c396d540,c0a36da0,c0918064,1,c09c1b6c) at sleepq_wait+0x65
cv_wait(c396d540,c0a36da0,c094f9cd,23f,dad52ad4) at cv_wait+0x1e6
_sx_xlock(c396d510,c094f9c4,23f,c09b8d80,dad52ad4) at _sx_xlock+0x99
devfs_lookup(dad52ad4,c092ddcb,37,c096da3c,c4143570) at devfs_lookup+0x36
VOP_LOOKUP_APV(c09b8d80,dad52ad4,c4146600,c4146600,0) at VOP_LOOKUP_APV+0xa4
lookup(dad52b80,0,c09204ab,b6,c070487d) at lookup+0x4ed
namei(dad52b80,c091c646,275,c09550c4,c4146600) at namei+0x43c
kern_lstat(c4146600,807a800,0,dad52c54,c091b47f) at kern_lstat+0x4f
lstat(c4146600,dad52d04,3c4,c096ba20,c4146600) at lstat+0x2f
syscall(3b,3b,3b,28232cb2,bfbfa2c1) at syscall+0x31f
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (190, FreeBSD ELF32, lstat), eip = 0x282251bb, esp = 0xbfbf9e7c, ebp = 0xbfbfab38 ---

db> trace 17071
Tracing pid 17071 tid 100076 td 0xc394cd80
sched_switch(c394cd80,0,1,c09545dc,c394cd80) at sched_switch+0x2ea
mi_switch(1,0,c0955f54,1ba,c091b594) at mi_switch+0x425
sleepq_switch(c41435c8,0,c0955f54,20c,d9bb6768) at sleepq_switch+0x110
sleepq_wait(c41435c8,0,c091bebc,c8,c394cd80) at sleepq_wait+0x65
msleep(c41435c8,c0a3743c,50,c0917eb4,0) at msleep+0x342
acquire(d9bb6840,80,60000,b1,0) at acquire+0x11c
lockmgr(c41435c8,2002,c4143638,c394cd80,d9bb6888) at lockmgr+0x576
vop_stdlock(d9bb68b0,c08f40c3,42af,c091d728,1) at vop_stdlock+0x32
VOP_LOCK_APV(c09b8d80,d9bb68b0,c4143638,8,c0920936) at VOP_LOCK_APV+0xa4
vn_lock(c4143570,2002,c394cd80,7a7,2) at vn_lock+0xd2
vget(c4143570,2,c394cd80,c3972005,c396c460) at vget+0xf0
devfs_allocv(c396c400,c3889800,d9bb6b94,c394cd80,d9bb6988) at devfs_allocv+0x43
devfs_lookupx(d9bb6ad4,c094f9c4,23f,c09b8d80,d9bb6ad4) at devfs_lookupx+0x5f5
devfs_lookup(d9bb6ad4,c092ddcb,37,c096dac3,c3971000) at devfs_lookupx+0x3e
VOP_LOOKUP_APV(c09b8d80,d9bb6ad4,c394cd80,c394cd80,0) at VOP_LOOKUP_APV+0xa4
lookup(d9bb6b80,0,c09204ab,b6,c0a82d70) at lookup+0x4ed
namei(d9bb6b80,c091c646,275,c09550c4,c394cd80) at namei+0x43c
kern_lstat(c394cd80,806d800,0,d9bb6c54,c091b47f) at kern_lstat+0x4f
lstat(c394cd80,d9bb6d04,3c4,c096ba20,c394cd80) at lstat+0x2f
syscall(3b,3b,3b,281b2cb2,bfbf9731) at syscall+0x31f
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (190, FreeBSD ELF32, lstat), eip = 0x281a51bb, esp = 0xbfbf92ec, ebp = 0xbfbf9fa8 ---

db> show alllocks
Process 17071 (sh) thread 0xc394cd80 (100076)
exclusive sx devfsmount r = 0 (0xc396d510) locked @ fs/devfs/devfs_vnops:575
Process 34 (irq1: atkbd0) thread 0xc3687a80 (100031)
exclusive sleep mutex Giant r = 0 (0xc0a37ac0) locked @ kern/kern_intr.c:683

db> show lock 0xc396d510
 class: sx
 name: devfsmount
 state: XLOCK: 0xc394cd80 (tid 100076, pid 17071, "sh")
 waiters: 0 shared, 1 exclusive

db> show lockedvnods
Locked vnodes

0xc3971000: tag devfs, type VDIR
    usecount 3, writecount 0, refcount 3 mountedhere 0
    flags (VV_ROOT)
     lock type devfs: EXCL (count 1) by thread 0xc394cd80 (pid 17071)
#0 0xc06903a1 at lockmgr+0x621
#1 0xc06f98db at vop_stdlock+0x32
#2 0xc08e18fb at VOP_LOCK_APV+0xa4
#3 0xc0712460 at vn_lock+0xd2
#4 0xc07040ff at vget+0xf0
#5 0xc0645702 at devfs_allocv+0x43
#6 0xc064550e at devfs_root+0x2c
#7 0xc06fc023 at lookup+0x829
#8 0xc06fb521 at namei+0x43c
#9 0xc070ba21 at kern_lstat+0x4f
#10 0xc070b9ad at lstat+0x2f
#11 0xc08cbb89 at syscall+0x31f
#12 0xc08b890f at Xint0x80_syscall+0x1f

0xc4143570: tag devfs, type VDIR
    usecount 1, writecount 0, refcount 2 mountedhere 0
    flags ()
     lock type devfs: EXCL (count 1) by thread 0xc4146600 (pid 17069) with 1 pending
#0 0xc06903a1 at lockmgr+0x621
#1 0xc06f98db at vop_stdlock+0x32
#2 0xc08e18fb at VOP_LOCK_APV+0xa4
#3 0xc0712460 at vn_lock+0xd2
#4 0xc07040ff at vget+0xf0
#5 0xc0645702 at devfs_allocv+0x43
#6 0xc064680e at devfs_lookupx+0x5f5
#7 0xc064685b at devfs_lookup+0x3e
#8 0xc08dea25 at VOP_LOOKUP_APV+0xa4
#9 0xc06fbce7 at lookup+0x4ed
#10 0xc06fb521 at namei+0x43c
#11 0xc070ba21 at kern_lstat+0x4f
#12 0xc070b9ad at lstat+0x2f
#13 0xc08cbb89 at syscall+0x31f
#14 0xc08b890f at Xint0x80_syscall+0x1f

After this the system still works, but it's unable to perform operations on the filesystem, like launching a new process or saving to disk. When this happens, the new process calls at some point a syscall (like open or chflags, for example) which goes through devfs_root() (namei -> lookup -> devfs_root -> devfs_allocv) and the vnode with the VV_ROOT flag gets its refcount and pending status incremented, having the process waiting there forever.
>How-To-Repeat:
This bug is cumbersome to reproduce. I'm mostly available to try whatever fixes or commands or debugging code people want me to try, but I'm explaining here how to reproduce it in case I'm MIA.

I ported a path sandbox application to FreeBSD from Gentoo Linux which triggered this bug randomly. This means it can trigger it only 10 seconds after starting running programs inside the sandbox or it can be quite a lot of hours compiling software inside it without triggering the bug (both have happened). However I've seen this enough times at different points to know that the path sandbox makes it much easier to trigger it (just think that the sandbox-wrapped syscalls multiply the real number of fs-related system calls performed overall, mainly to check access permissions). Compiling lots of packages is probably the best way to get it to trigger the bug. Building kernel and world can also help.

Note that this doesn't mean you can't get the same results by some other means. Just having some specific tests stressing the system long enough should also work.

You can reproduce it by running Gentoo/FreeBSD, which btw *doesn't* modify FreeBSD sources, and unmasking and installing latest sandbox version, OR you can do so under a vanilla FreeBSD install. If the latter, you can install and run the path sandbox by following these steps: 
- Download http://unleashed.kicks-ass.org/gentoo/sandbox/sandbox-1.2.20alpha2.tar.bz2 --or download it from Gentoo mirrors-- and make sure you installed bash.
- Make /bin/bash point to the real bash executable, if not already there.
- Extract the tarball and run "./configure --host=i386-unknown-freebsd6.1 --build=i386-unknown-freebsd6.1 --prefix=/usr/local/sandbox --sysconfdir=/usr/local/sandbox/etc --datadir=/usr/local/sandbox/usr/share --mandir=/usr/local/sandbox/usr/share/man --infodir=/usr/local/sandbox/usr/share/info --localstatedir=/usr/local/sandbox/var/lib --libdir=/usr/local/sandbox/usr/lib &&  gmake && gmake install && mkdir /var/log/sandbox" in the new directory. Make sure you have created the directory structure specified in the config options you want to use.
- Run /usr/local/sandbox/bin/sandbox. The bash prompt will change, starting with [s], so you know the sandbox is active. You can optionally play running the path sandbox using some environment variables specifying a space-separated list of directories where it should allow/disallow accesses (SANDBOX_DENY, SANDBOX_READ, SANDBOX_WRITE, SANDBOX_PREDICT) and activating debug info (SANDBOX_DEBUG=1).

>From here on doesn't matter where you're trying to reproduce this: your best bet is to start compiling software compulsively until the system's disk apparently freezes and no new program can be launched (you can have "top" running permanently and look for processes stalled in "devfs" and "devfsm" waitchannels whenever you think the bug has been triggered).
>Fix:

>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list