LOR: ufs vs bufwait
Attilio Rao
attilio at freebsd.org
Sat May 8 17:55:18 UTC 2010
2010/5/8 Ulrich Spörlein <uqs at spoerlein.net>:
> On Sat, 08.05.2010 at 18:00:50 +0200, Attilio Rao wrote:
>> 2010/5/8 Ulrich Spörlein <uqs at spoerlein.net>:
>> > On Sat, 08.05.2010 at 12:20:05 +0200, Ulrich Spörlein wrote:
>> >> This LOR also is not yet listed on the LOR page, so I guess it's rather
>> >> new. I do use SUJ.
>> >>
>> >> lock order reversal:
>> >> 1st 0xc48388d8 ufs (ufs) @ /usr/src/sys/kern/vfs_lookup.c:502
>> >> 2nd 0xec0fe304 bufwait (bufwait) @ /usr/src/sys/ufs/ffs/ffs_softdep.c:11363
>> >> 3rd 0xc49e56b8 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2091
>> >> KDB: stack backtrace:
>> >> db_trace_self_wrapper(c09394fe,fb817308,c062e515,c061e8ab,c093c4f1,...) at db_trace_self_wrapper+0x26
>> >> kdb_backtrace(c061e8ab,c093c4f1,c418b168,c418ef28,fb817364,...) at kdb_backtrace+0x29
>> >> _witness_debugger(c093c4f1,c49e56b8,c092e785,c418ef28,c094369d,...) at _witness_debugger+0x25
>> >> witness_checkorder(c49e56b8,9,c094369d,82b,0,...) at witness_checkorder+0x839
>> >> __lockmgr_args(c49e56b8,80100,c49e56d8,0,0,...) at __lockmgr_args+0x7f9
>> >> ffs_lock(fb817488,c062e2bb,c0942b3f,80100,c49e5660,...) at ffs_lock+0x82
>> >> VOP_LOCK1_APV(c09bd600,fb817488,c4827cd4,c09d62a0,c49e5660,...) at VOP_LOCK1_APV+0xb5
>> >> _vn_lock(c49e5660,80100,c094369d,82b,4,...) at _vn_lock+0x5e
>> >> vget(c49e5660,80100,c4827c30,50,0,...) at vget+0xb9
>> >> vfs_hash_get(c47bea20,b803,80000,c4827c30,fb8175d8,...) at vfs_hash_get+0xe6
>> >> ffs_vgetf(c47bea20,b803,80000,fb8175d8,1,...) at ffs_vgetf+0x49
>> >> softdep_sync_metadata(c4838880,0,c0962957,144,0,...) at softdep_sync_metadata+0xc82
>> >> ffs_syncvnode(c4838880,1,c4827c30,fb817698,246,...) at ffs_syncvnode+0x3e2
>> >> ffs_truncate(c4838880,200,0,880,c41fb480,...) at ffs_truncate+0x862
>> >> ufs_direnter(c4838880,c49e5660,fb81794c,fb817bd4,0,...) at ufs_direnter+0x8d4
>> >> ufs_makeinode(fb817bd4,0,fb817b30,fb817a94,c08e4cf5,...) at ufs_makeinode+0x517
>> >> ufs_create(fb817b30,fb817b48,0,0,fb817ba8,...) at ufs_create+0x30
>> >> VOP_CREATE_APV(c09bd600,fb817b30,2,fb817ac0,0,...) at VOP_CREATE_APV+0xa5
>> >> vn_open_cred(fb817ba8,fb817c5c,1a4,0,c41fb480,...) at vn_open_cred+0x1de
>> >> vn_open(fb817ba8,fb817c5c,1a4,c47e2428,0,...) at vn_open+0x3b
>> >> kern_openat(c4827c30,ffffff9c,804c5e8,0,602,...) at kern_openat+0x125
>> >> kern_open(c4827c30,804c5e8,0,601,21b6,...) at kern_open+0x35
>> >> open(c4827c30,fb817cf8,c0972725,c091f062,c47ea2a8,...) at open+0x30
>> >> syscall(fb817d38) at syscall+0x220
>> >> Xint0x80_syscall() at Xint0x80_syscall+0x20
>> >> --- syscall (5, FreeBSD ELF32, open), eip = 0x2817bf33, esp = 0xbfbfec4c, ebp = 0xbfbfecb8 ---
>> >
>> > And now the system is hanging again. While I can still ping and receive
>> > dmesg updates (eg. USB ports appearing), I/O is frozen solid. This is
>> > during portupgrade, when the configure script runs and usually takes 1-2
>> > minutes to provoke.
>> >
>> > This part looks supsicious to me:
>> >
>> > db> show alllocks
>> > Process 28014 (mkdir) thread 0xc691ac30 (100152)
>> > exclusive lockmgr bufwait (bufwait) r = 0 (0xec2bdaf0) locked @ /usr/src/sys/ufs/ffs/ffs_softdep.c:10684
>> > exclusive lockmgr ufs (ufs) r = 0 (0xc6bcd5a8) locked @ /usr/src/sys/kern/vfs_subr.c:2091
>> > exclusive lockmgr bufwait (bufwait) r = 0 (0xec2983f4) locked @ /usr/src/sys/ufs/ffs/ffs_softdep.c:11363
>> > exclusive lockmgr ufs (ufs) r = 0 (0xc6d976b8) locked @ /usr/src/sys/kern/vfs_lookup.c:502
>> > Process 1990 (sshd) thread 0xc5462750 (100117)
>> > exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xc546e08c) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148
>> > Process 12 (intr) thread 0xc41f4750 (100004)
>> > exclusive sleep mutex ttymtx (ttymtx) r = 0 (0xc425ae04) locked @ /usr/src/sys/dev/dcons/dcons_os.c:232
>> > db>
>>
>> Along with show alllocks may you also get the following from DDB:
>> ps, show pcpu, alltrace, lockedvnods.
>
> 1. a kernel before SUJ went in is running fine with SU only
> 2. the following is on a recent -CURRENT that has SUJ, *but* i've
> disabled it, so it is running with soft-updates only (I hope)
>
> I ran a portupgrade and the first configure script triggered the I/O
> hang
>
> db> ps
> pid ppid pgrp uid state wmesg wchan cmd
> 13467 13444 12937 0 R+ mkdir
> 13444 13204 12937 0 S+ wait 0xc54352a8 sh
> 13204 13035 12937 0 S+ wait 0xc5436000 sh
> 13035 12937 12937 0 S+ wait 0xc4ffad48 sh
> 12937 12936 12937 0 Ss+ wait 0xc4ff9d48 make
> 12936 3722 3722 0 R+ script
> 3722 2021 3722 0 S+ (threaded) ruby18
> 100132 S wait 0xc4ffa7f8 ruby18
> 2404 2007 2404 1000 Ss+ ttyin 0xc4d74870 zsh
> 2325 2015 2325 1000 R+ top
> 2021 2009 2021 0 S+ pause 0xc4ff9058 csh
> 2015 2007 2015 1000 Ss+ pause 0xc4ffa058 zsh
> 2009 2007 2009 1000 Ss+ pause 0xc4d4e850 zsh
> 2007 2006 2007 1000 Rs screen
> 2006 1991 2006 1000 R+ screen
> 2005 2001 2005 0 R+ systat
> 2001 1976 2001 0 S+ pause 0xc3d52058 csh
> 2000 1 2000 0 Ss select 0xc3d5b1a4 ssh-agent
> 1991 1990 1991 1000 Ss+ pause 0xc3d52850 zsh
> 1990 1986 1986 1000 S select 0xc3d5aca4 sshd
> 1989 1 1989 1000 Ss select 0xc3d5ae24 ssh-agent
> 1986 1884 1986 0 Ss sbwait 0xc4f2058c sshd
> 1985 1 1985 0 Ss+ ttyin 0xc3e62670 getty
> 1984 1 1984 0 Ss+ ttyin 0xc3e61c70 getty
> 1983 1 1983 0 Ss+ ttyin 0xc3e60070 getty
> 1982 1 1982 0 Ss+ ttyin 0xc3e60a70 getty
> 1981 1 1981 0 Ss+ ttyin 0xc3da7a70 getty
> 1980 1 1980 0 Ss+ ttyin 0xc3da7c70 getty
> 1979 1 1979 0 Ss+ ttyin 0xc3e60c70 getty
> 1976 1 1976 0 Ss+ wait 0xc4c392a8 login
> 1955 1 1955 0 Rs bsnmpd
> 1931 1 1931 0 Ss select 0xc4d0f164 inetd
> 1890 1 1890 0 Rs cron
> 1884 1 1884 0 Ss select 0xc4d0eda4 sshd
> 1808 1 1808 0 Rs sendmail
> 1764 1 1764 0 Rs moused
> 1754 1 1754 0 Ss (threaded) nslcd
> 100111 S accept 0xc4d076ae nslcd
> 100110 S accept 0xc4d076ae nslcd
> 100109 S accept 0xc4d076ae nslcd
> 100108 S accept 0xc4d076ae nslcd
> 100107 S accept 0xc4d076ae nslcd
> 100076 S uwait 0xc4493180 nslcd
> 1748 1 1748 0 Rs openvpn
> 1724 1 1724 201 Ss accept 0xc4d07b82 privoxy
> 1672 1 1672 0 Rs powerd
> 1664 1 1664 0 Rs ntpd
> 1658 1 1658 0 Ss (threaded) nscd
> 100094 S kqread 0xc4cf1700 nscd
> 100093 S kqread 0xc4cf1700 nscd
> 100092 S kqread 0xc4cf1700 nscd
> 100091 S kqread 0xc4cf1700 nscd
> 100090 S kqread 0xc4cf1700 nscd
> 100089 S kqread 0xc4cf1700 nscd
> 100088 S kqread 0xc4cf1700 nscd
> 100087 S kqread 0xc4cf1700 nscd
> 100075 S uwait 0xc449f680 nscd
> 1576 1 1576 389 Ss (threaded) slapd
> 100115 S ucond 0xc4cf2700 slapd
> 100086 S ucond 0xc4cf2600 slapd
> 100085 S select 0xc4d0eea4 slapd
> 100073 S uwait 0xc42a9580 slapd
> 1563 1 1562 0 S nanslp 0xc09f3864 smartd
> 1526 1 1526 0 Rs rpc.statd
> 1445 1 1445 0 Rs amd
> 1418 1 1418 0 Rs rpcbind
> 1403 1 1403 53 Ss (threaded) named
> 100084 S kqread 0xc4cdcb80 named
> 100083 S ucond 0xc4260b00 named
> 100082 S ucond 0xc449fb80 named
> 100074 S sigwait 0xfb4b8be0 named
> 1306 1 1306 0 Rs syslogd
> 997 0 0 0 RL [pfpurge]
> 973 1 973 0 Ss select 0xc41eb0a4 devd
> 964 1 964 65 Ss select 0xc44ac524 dhclient
> 942 1 942 0 Ss select 0xc426f424 dhclient
> 856 1 856 0 Ss select 0xc41eb424 moused
> 561 1 561 0 Ss select 0xc44acde4 wpa_supplicant
> 150 1 150 0 Ss pause 0xc42ec850 adjkerntz
> 95 0 0 0 DL mdwait 0xc426a000 [md0]
> 74 0 0 0 DL geli:w 0xc4265400 [g_eli[0] ad0s4f]
> 22 0 0 0 RL [softdepflush]
> 21 0 0 0 RL [syncer]
> 20 0 0 0 RL [vnlru]
> 19 0 0 0 RL [bufdaemon]
> 18 0 0 0 RL [pagezero]
> 17 0 0 0 DL psleep 0xc0b769d8 [vmdaemon]
> 16 0 0 0 RL [pagedaemon]
> 9 0 0 0 DL ccb_scan 0xc09d78d4 [xpt_thrd]
> 15 0 0 0 DL tzpoll 0xc09dafcc [acpi_thermal]
> 8 0 0 0 SL - 0xc3f55000 [fw0_probe]
> 7 0 0 0 DL - 0xc3ebd3bc [cbb0 event thread]
> 14 0 0 0 DL (threaded) [usb]
> 100040 D - 0xc3f39d0c [usbus3]
> 100039 D - 0xc3f39cdc [usbus3]
> 100038 D - 0xc3f39cac [usbus3]
> 100037 D - 0xc3f39c7c [usbus3]
> 100036 D - 0xc3f1fdac [usbus2]
> 100035 D - 0xc3f1fd7c [usbus2]
> 100034 D - 0xc3f1fd4c [usbus2]
> 100033 D - 0xc3f1fd1c [usbus2]
> 100032 D - 0xc3f0ddac [usbus1]
> 100031 D - 0xc3f0dd7c [usbus1]
> 100030 D - 0xc3f0dd4c [usbus1]
> 100029 D - 0xc3f0dd1c [usbus1]
> 100028 D - 0xc3ef7dac [usbus0]
> 100027 D - 0xc3ef7d7c [usbus0]
> 100026 D - 0xc3ef7d4c [usbus0]
> 100025 D - 0xc3ef7d1c [usbus0]
> 6 0 0 0 DL crypto_r 0xc0b7446c [crypto returns]
> 5 0 0 0 DL crypto_w 0xc0b74448 [crypto]
> 13 0 0 0 DL - 0xc09f36c4 [yarrow]
> 4 0 0 0 DL - 0xc09f13e4 [g_down]
> 3 0 0 0 DL - 0xc09f13e0 [g_up]
> 2 0 0 0 RL [g_event]
> 12 0 0 0 RL (threaded) [intr]
> 100049 I [irq12: psm0]
> 100048 I [irq1: atkbd0]
> 100046 I [irq15: ata1]
> 100045 I [irq14: ata0]
> 100024 I [irq11: cbb0 bfe0++*]
> 100023 I [irq9: pcm0 iwi0+]
> 100022 I [swi6: Giant taskq]
> 100020 I [swi5: +]
> 100019 I [swi2: cambio]
> 100014 I [swi6: task queue]
> 100006 I [swi3: vm]
> 100005 Run CPU 0 [swi4: clock]
> 100004 I [swi1: netisr 0]
> 11 0 0 0 RL [idle: cpu0]
> 1 0 1 0 SLs wait 0xc3df2d48 [init]
> 10 0 0 0 DL audit_wo 0xc0b747e0 [audit]
> 0 0 0 0 RLs (threaded) [kernel]
> 100050 D deadlkre 0xc09f36c4 [deadlkres]
> 100044 D - 0xc3f5db80 [iwi0 taskq]
> 100042 D - 0xc3f5e8c0 [fw0_taskq]
> 100021 D - 0xc3ea08c0 [thread taskq]
> 100018 D - 0xc3ea0a80 [kqueue taskq]
> 100017 D - 0xc3ea0c40 [acpi_task_2]
> 100016 D - 0xc3ea0c40 [acpi_task_1]
> 100015 D - 0xc3ea0c40 [acpi_task_0]
> 100010 D - 0xc3dda9c0 [firmware taskq]
> 100000 RunQ [swapper]
> 1999 1976 1976 0 Z+ ssh-agent
> 1988 1986 1986 1000 Z ssh-agent
> db> show alllocks
> Process 13467 (mkdir) thread 0xc543b270 (100150)
> exclusive lockmgr bufwait (bufwait) r = 0 (0xebd1f8ec) locked @ /usr/src/sys/kern/vfs_bio.c:2581
> exclusive lockmgr ufs (ufs) r = 0 (0xc542fc08) locked @ /usr/src/sys/kern/vfs_subr.c:2091
> exclusive lockmgr bufwait (bufwait) r = 0 (0xebd15194) locked @ /usr/src/sys/ufs/ffs/ffs_softdep.c:11363
> exclusive lockmgr ufs (ufs) r = 0 (0xc54279e8) locked @ /usr/src/sys/kern/vfs_lookup.c:502
> Process 1986 (sshd) thread 0xc4d2ec30 (100113)
> exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xc4f20560) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148
> Process 22 (softdepflush) thread 0xc428f750 (100058)
> exclusive sleep mutex struct mount mtx (struct mount mtx) r = 0 (0xc42a3a20) locked @ /usr/src/sys/kern/vfs_subr.c:341
> Process 12 (intr) thread 0xc3df44e0 (100005)
> exclusive sleep mutex ttymtx (ttymtx) r = 0 (0xc3e63604) locked @ /usr/src/sys/dev/dcons/dcons_os.c:232
> db> show pcpu
This might have been 'show allpcpu' which is different.
> <snip>
> I hope you don't really need all of the traces, I'll keep the system at
> the DDB prompt for now, so I can do other stuff.
Yep all the traces are what is needed.
Attilio
--
Peace can only be achieved by understanding - A. Einstein
More information about the freebsd-current
mailing list