vlruwk deadlock (was: NFS troubles on recent -current.)
Bernd Walter
ticso at cicely12.cicely.de
Fri May 16 10:23:42 PDT 2003
On Mon, May 12, 2003 at 10:33:08PM +0200, Bernd Walter wrote:
> On Mon, May 12, 2003 at 01:45:21AM +0200, Bernd Walter wrote:
> > On Sun, May 11, 2003 at 04:29:34PM -0700, Don Lewis wrote:
> > > On 11 May, Bernd Walter wrote:
> > > > I have a -current (10th may) NFS server which serves for a diskless
> > > > NFS client (11th May).
> > > > The client has / on a CF media and /usr as well as /var/spool on the
> > > > server.
> > > > Also swap is on the server (/usr/swapfile) via md.
> > > > When I run installworld on the client the server hangs with
> > > > "lockmgr draning against myself".
> > > > I got no DDB prompt so no stacktrace or crashdump :(
> > >
> > > Try the attached patch. I didn't have any problems with
> > > make -j10 buildworld
> > > make DESTDIR=/mnt installworld
> > > with /usr/obj and /mnt NFS mounted from the filesystems exported on the
> > > same server (though I had to tweak some sysctl variables to avoid wdrain
> > > deadlocks in the bio code in this configuration).
> > >
> > > Can you break into the debugger with CNTRL-Alt-Esc? I also recommend
> >
> > I havn't tried as it shouldn't be required after a panic.
> >
> > > trying the DEBUG_VFS_LOCKS kernel configuration option, which will drop
> > > into ddb if any of the vnode locking assertions are violated.
> >
> > I will try your patch and suggestions tomorrow.
>
> I can't reproduce the panic with your patch anymore.
> Great work!
I had an io deadlook on the same server today while doing a make
release on an alpha nfs client.
I don't know if it is related to the given patch or not.
It happened shortly after the client checked out the ports.
Sorry - I forgot to take a dump.
One of the last commands I was able to send:
[244]cicely5# ps -axl
UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND
0 0 0 0 -16 0 0 4 sched DLs ?? 0:02.92 (swapper)
0 1 0 0 8 0 716 260 wait ILs ?? 0:02.39 /sbin/init --
0 2 0 0 -8 0 0 12 g_even DL ?? 0:54.29 (g_event)
0 3 0 1 -8 0 0 12 g_up DL ?? 23:28.80 (g_up)
0 4 0 0 -8 0 0 12 g_down DL ?? 10:24.32 (g_down)
0 5 0 0 -16 0 0 12 psleep DL ?? 1:36.59 (pagedaemon)
0 6 0 0 20 0 0 12 psleep DL ?? 0:00.00 (vmdaemon)
0 7 0 0 171 0 0 12 pgzero DL ?? 7:55.19 (pagezero)
0 8 0 0 -16 0 0 12 psleep DL ?? 0:28.41 (bufdaemon)
0 9 0 0 20 0 0 12 vlrup DL ?? 0:34.80 (vnlru)
0 10 0 0 -16 0 0 12 ktrace DL ?? 0:00.00 (ktrace)
0 11 0 17 -16 0 0 12 - RL ?? 5384:08.48 (idle)
0 12 0 0 -48 0 0 12 - WL ?? 18:42.52 (swi7: tty:sio clock)
0 14 0 0 -44 0 0 12 - WL ?? 41:44.12 (swi1: net)
0 15 0 0 76 0 0 12 sleep DL ?? 5:28.24 (random)
0 18 0 1 -36 0 0 12 - WL ?? 4:28.69 (swi3: cambio)
0 22 0 0 -64 0 0 12 - WL ?? 0:00.00 (irq15: ata1)
0 23 0 1 -64 0 0 12 - WL ?? 3:14.38 (irq5: ahc0 ahc1++)
0 24 0 0 -64 0 0 12 - WL ?? 0:32.48 (irq12: ahc3)
0 25 0 0 -68 0 0 12 - WL ?? 18:45.62 (irq10: de0)
0 26 0 0 -64 0 0 12 - WL ?? 0:26.23 (irq11: ahc5 ahc6+)
0 27 0 0 -60 0 0 12 - WL ?? 0:00.03 (irq1: atkbd0)
0 28 0 0 -64 0 0 12 - WL ?? 0:00.00 (irq6: fdc0)
0 29 0 0 -60 0 0 12 - WL ?? 0:00.00 (irq7: ppc0)
0 30 0 0 -48 0 0 12 - WL ?? 0:00.00 (swi0: tty:sio)
0 35 0 0 20 0 0 12 syncer DL ?? 19:05.67 (syncer)
0 36 0 0 8 0 0 12 nfsidl IL ?? 2:17.00 (nfsiod 0)
0 37 0 12 8 0 0 12 nfsidl IL ?? 0:22.74 (nfsiod 1)
0 38 0 0 8 0 0 12 nfsidl IL ?? 0:00.01 (nfsiod 2)
0 39 0 0 8 0 0 12 nfsidl IL ?? 0:00.00 (nfsiod 3)
0 73 1 6 -8 0 848 176 vinum DLs ?? 0:00.02 vinum: vinum daemon (vinum)
0 127 1 26 20 0 224 84 pause Is ?? 0:00.00 adjkerntz -i
0 278 1 0 96 0 1196 712 select Is ?? 1:17.76 /usr/sbin/syslogd -s
0 294 1 0 96 0 1516 960 select Ss ?? 0:02.01 /usr/sbin/rpcbind
0 310 1 0 96 0 1244 876 select Ss ?? 0:10.22 /usr/sbin/ypbind
0 379 1 0 96 0 1404 1076 select Is ?? 0:00.79 /usr/sbin/mountd -r
0 381 1 0 96 0 1212 832 select Is ?? 0:00.31 nfsd: master (nfsd)
0 384 381 0 -4 0 1148 692 vlruwk D ?? 90:01.02 nfsd: server (nfsd)
0 385 381 0 4 0 1148 692 nfsd S ?? 20:15.06 nfsd: server (nfsd)
0 386 381 0 4 0 1148 692 nfsd S ?? 8:51.20 nfsd: server (nfsd)
0 387 381 0 4 0 1148 692 nfsd I ?? 5:39.17 nfsd: server (nfsd)
0 389 1 0 96 0 263420 804 select Is ?? 0:01.52 /usr/sbin/rpc.statd
0 427 1 0 96 0 6072 4052 select Ss ?? 1:53.33 /usr/local/sbin/httpd -k start -DSSL
0 435 1 7 20 0 1832 696 lockf Is ?? 0:00.01 /usr/local/sbin/saslauthd -a pam
0 436 435 7 4 0 1832 696 accept I ?? 0:00.00 /usr/local/sbin/saslauthd -a pam
0 437 435 7 20 0 1832 696 lockf I ?? 0:00.00 /usr/local/sbin/saslauthd -a pam
0 438 435 7 20 0 1832 696 lockf I ?? 0:00.00 /usr/local/sbin/saslauthd -a pam
0 439 435 7 20 0 1832 696 lockf I ?? 0:00.00 /usr/local/sbin/saslauthd -a pam
0 447 1 1 96 0 1220 768 select Is ?? 0:00.07 /usr/sbin/lpd
0 462 1 0 96 0 1548 948 select Ss ?? 1:13.14 /usr/sbin/ntpd -p /var/run/ntpd.pid
1 474 1 0 -4 0 1176 748 vlruwk Ds ?? 0:12.10 /usr/sbin/rwhod
0 515 1 0 96 0 3320 1764 select Is ?? 0:05.30 /usr/sbin/sshd
0 523 1 0 96 0 5208 2616 select Ss ?? 1:38.83 sendmail: accepting connections (sendmail)
25 526 1 0 20 0 3452 2012 pause Is ?? 0:01.18 sendmail: Queue runner at 00:30:00 for /var/spool/clientmqueue (sen
0 544 1 0 -4 0 1272 840 vlruwk Ds ?? 0:07.50 /usr/sbin/cron
0 581 1 0 96 0 1348 888 select Is ?? 0:00.06 /usr/sbin/inetd -wW
0 596 1 34 5 0 1212 732 siodcd I ?? 0:00.02 /usr/libexec/getty std.9600 ttyd0
0 601 515 0 4 0 6124 2116 sbwait Is ?? 0:00.28 sshd: ticso [priv] (sshd)
1007 603 601 0 96 0 6124 2196 select I ?? 1:15.78 sshd: ticso at ttyp0 (sshd)
0 3121 515 0 4 0 6124 2132 sbwait Is ?? 0:00.25 sshd: ticso [priv] (sshd)
1007 3123 3121 0 96 0 6124 2188 select I ?? 0:00.11 sshd: ticso at ttyp1 (sshd)
80 24548 427 5 20 0 6072 4080 lockf I ?? 0:00.05 /usr/local/sbin/httpd -k start -DSSL
80 24549 427 5 20 0 6072 4080 lockf I ?? 0:00.04 /usr/local/sbin/httpd -k start -DSSL
80 24550 427 5 20 0 6072 4080 lockf I ?? 0:00.04 /usr/local/sbin/httpd -k start -DSSL
80 24551 427 6 20 0 6072 4080 lockf I ?? 0:00.05 /usr/local/sbin/httpd -k start -DSSL
80 24552 427 6 96 0 6072 4080 select I ?? 0:00.05 /usr/local/sbin/httpd -k start -DSSL
60 28689 440 2 -4 0 24836 3240 vlruwk D ?? 2:27.68 imapd: imapd: cicely12.cicely.de[3ffe:400:8d0:301::12] ticso use
0 29641 523 0 4 0 5500 3404 sbwait S ?? 0:00.29 sendmail: h4GGWjhx029641 gif-cicely.cosmo-project.de [IPv6:3ffe:
0 29643 544 0 -8 0 1320 1004 piperd I ?? 0:00.00 cron: running job (cron)
2 29644 29643 0 8 0 868 348 wait Is ?? 0:00.04 /bin/sh -c /usr/libexec/save-entropy
2 29645 29644 1 8 0 908 372 wait I ?? 0:00.05 /bin/sh /usr/libexec/save-entropy
2 29656 29645 0 -4 0 240 140 vlruwk D ?? 0:00.00 dd if=/dev/random of=/var/db/entropy/saved-entropy.1 bs=2048 cou
0 29657 523 1 4 0 5268 2712 kqread S ?? 0:00.01 sendmail: startup with gif-cicely.cosmo-project.de (sendmail)
0 29658 523 0 4 0 5268 2712 kqread S ?? 0:00.01 sendmail: startup with gif-cicely.cosmo-project.de (sendmail)
1007 604 603 0 20 0 1596 1108 pause Is p0 0:00.19 -tcsh (tcsh)
0 10000 604 0 20 0 1720 1252 pause I p0 0:01.95 tcsh
0 29647 10000 0 96 0 700 324 - R+ p0 0:00.01 ps -axl
1007 3124 3123 0 5 0 1556 1052 ttyin Is+ p1 0:00.17 -tcsh (tcsh)
1007 3132 3124 0 8 0 896 368 - T p1 0:00.05 /bin/sh ./run-mozilla.sh ./mozilla-bin
1007 3143 3132 11 97 0 48896 39668 - T p1 43:46.76 ./mozilla-bin
0 592 1 0 8 0 1636 1180 wait Is v0 0:00.16 login [pam] (login)
1007 13632 592 0 20 0 1548 1060 pause I v0 0:00.13 -tcsh (tcsh)
1007 13636 13632 0 96 0 3260 2076 select I+ v0 0:23.72 ssh c6
0 593 1 99 5 0 1216 796 ttyin Is+ v1 0:00.02 /usr/libexec/getty Pc ttyv1
0 594 1 99 5 0 1216 796 ttyin Is+ v2 0:00.02 /usr/libexec/getty Pc ttyv2
0 595 1 99 5 0 1216 796 ttyin Is+ v3 0:00.02 /usr/libexec/getty Pc ttyv3
60 440 1 0 96 0 2780 1348 select I con- 0:15.58 /usr/local/cyrus/bin/master
I saw many of those process already over minutes waiting with vlruwk
in top output on another window.
--
B.Walter BWCT http://www.bwct.de
ticso at bwct.de info at bwct.de
More information about the freebsd-current
mailing list