nfsclient process stucks in nfsaio

Rong-En Fan grafan at gmail.com
Sat Mar 11 02:07:26 UTC 2006


Hi,

With INVARIANT, WITNESS enabled, when I tried to ^C
to exit dd, it panics immediately. Some ddb & kgdb
messages below (I have KDB_TRACE, KDB_UNATTENDED).
Core file is available. Any help is appreciated :-)

UPDATE: sometimes, I cant ^C or kill -9 the dd process even
with mpsafenet=0. In that situation, a panic with similar trace
as below, which is mpsafenet=1.

panic: VOP_STRATEGY failed bp=0xd835acd8 vp=0xc4a1baa0
cpuid = 1
KDB: stack backtrace:
kdb_backtrace(1,c05056b4,1,e7f1b7d0,1) at kdb_backtrace+0x2e
panic(c061782c,d835acd8,c4a1baa0,c4a1baa0,4) at panic+0x12b
bufstrategy(c4a1bb60,d835acd8,e7f1b80c,c471ee63,d835acd8) at bufstrategy+0x7d
bstrategy(d835acd8,c060be84,23c,a00200a6,0) at bstrategy+0x60
nfs_writebp(d835acd8,1,c4369000,e7f1b82c,c471eb73) at nfs_writebp+0xf3
nfs_bwrite(d835acd8,e7f1b904,c471e92b,d835acd8,1dd88000) at nfs_bwrite+0x13
bwrite(d835acd8,1dd88000,0,1dd86000,0) at bwrite+0x5b
nfs_flush(c4a1baa0,1,c4369000,1,e7f1b92c) at nfs_flush+0x78b
nfs_fsync(e7f1b93c) at nfs_fsync+0x1c
VOP_FSYNC_APV(c4735fc0,e7f1b93c) at VOP_FSYNC_APV+0x99
VOP_FSYNC(c4a1baa0,1,c4369000) at VOP_FSYNC+0x2e
bufsync(c4a1bb60,1,c4369000) at bufsync+0x14
bufobj_invalbuf(c4a1bb60,1,c4369000,100,0) at bufobj_invalbuf+0xda
vinvalbuf(c4a1baa0,1,c4369000,100,0) at vinvalbuf+0x1d
nfs_vinvalbuf(c4a1baa0,1,c4369000,1,c04d5738) at nfs_vinvalbuf+0xda
nfs_write(e7f1bbc8) at nfs_write+0x16f
VOP_WRITE_APV(c4735fc0,e7f1bbc8) at VOP_WRITE_APV+0x11e
VOP_WRITE(c4a1baa0,e7f1bcb0,7f0001,c49f5180) at VOP_WRITE+0x34
vn_write(c46d6ca8,e7f1bcb0,c49f5180,0,c4369000) at vn_write+0x1ad
fo_write(c46d6ca8,e7f1bcb0,c49f5180,0,c4369000) at fo_write+0x1d
dofilewrite(c4369000,4,c46d6ca8,e7f1bcb0,ffffffff,ffffffff,0) at
dofilewrite+0x8e
kern_writev(c4369000,4,e7f1bcb0) at kern_writev+0x41
write(c4369000,e7f1bcf0) at write+0x58
syscall(3b,3b,3b,8076000,100000) at syscall+0x2cf
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (4, FreeBSD ELF32, write), eip = 0x880b9813, esp =
0xbfbfeaac, ebp = 0xbfbfead8 ---
Uptime: 4m18s
Dumping 3062 MB (2 chunks)
[...]

(kgdb) bt full
#0  0xc04a8181 in doadump () at /usr/src/sys/kern/kern_shutdown.c:233
No locals.
#1  0xc04a8841 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399
        first_buf_printf = 1
#2  0xc04a8bf9 in panic (fmt=0xc061782c "VOP_STRATEGY failed bp=%p vp=%p")
    at /usr/src/sys/kern/kern_shutdown.c:555
        td = (struct thread *) 0xc4369000
        bootopt = 260
        newpanic = 1
        ap = 0xe7f1b7d0 "ج5ؠ��Ġ���\004"
        buf = "VOP_STRATEGY failed bp=0xd835acd8 vp=0xc4a1baa0", '\0'
<repeats 208 times>
#3  0xc0505689 in bufstrategy (bo=0xc4a1bb60, bp=0xd835acd8)
    at /usr/src/sys/kern/vfs_bio.c:3690
        i = 4
        vp = (struct vnode *) 0xc4a1baa0
#4  0xc471ef28 in ?? ()
No symbol table info available.
#5  0xc4a1bb60 in ?? ()
No symbol table info available.
#6  0xd835acd8 in ?? ()
No symbol table info available.
#7  0xe7f1b80c in ?? ()
No symbol table info available.
#8  0xc471ee63 in ?? ()
No symbol table info available.
#9  0xd835acd8 in ?? ()
No symbol table info available.
#10 0xc060be84 in __func__.2 ()
No symbol table info available.
#11 0x0000023c in ?? ()
No symbol table info available.
#12 0xa00200a6 in ?? ()
No symbol table info available.
#13 0x00000000 in ?? ()
No symbol table info available.
#14 0xe7f1b820 in ?? ()
No symbol table info available.
#15 0xc471f2a3 in ?? ()
No symbol table info available.
#16 0xd835acd8 in ?? ()
No symbol table info available.
#17 0x00000001 in ?? ()
No symbol table info available.
#18 0xc4369000 in ?? ()
No symbol table info available.
#19 0xe7f1b82c in ?? ()
No symbol table info available.
#20 0xc471eb73 in ?? ()
No symbol table info available.
#21 0xd835acd8 in ?? ()
No symbol table info available.
#22 0xe7f1b904 in ?? ()
No symbol table info available.
#23 0xc471e92b in ?? ()
No symbol table info available.
#24 0xd835acd8 in ?? ()
No symbol table info available.
#25 0x1dd88000 in ?? ()
No symbol table info available.
#26 0x00000000 in ?? ()
No symbol table info available.
#27 0x1dd86000 in ?? ()
No symbol table info available.
#28 0x00000000 in ?? ()
No symbol table info available.
#29 0xe7f1b858 in ?? ()
No symbol table info available.
#30 0xc049ee97 in _mtx_assert (m=0xd835acd8, what=-1067401596,
    file=0x23c <Address 0x23c out of bounds>, line=-1610481498)
    at /usr/src/sys/kern/kern_mutex.c:754
No locals.
Previous frame inner to this frame (corrupt stack?)
(kgdb) l *0xc0505689
0xc0505689 is in bufstrategy (/usr/src/sys/kern/vfs_bio.c:3691).
3686            KASSERT(vp == bo->bo_private, ("Inconsistent vnode
bufstrategy"));
3687            KASSERT(vp->v_type != VCHR && vp->v_type != VBLK,
3688                ("Wrong vnode in bufstrategy(bp=%p, vp=%p)", bp, vp));
3689            i = VOP_STRATEGY(vp, bp);
3690            KASSERT(i == 0, ("VOP_STRATEGY failed bp=%p vp=%p",
bp, bp->b_vp));
3691    }
3692
3693    void
3694    bufobj_wrefl(struct bufobj *bo)
3695    {


On 3/10/06, Rong-En Fan <grafan at gmail.com> wrote:
> Hi,
>
> forget to mention all the clients/servers here are SMP kernel.
> After some Googling, a post on current@ 2005/01/12
> "NFS problems, locking up" is hightly related to my situation.
> An workaround is to set debug.mpsafenet=0, just verified this
> indeed works.
>
> Now I'm turning on INVARIANTS, WITNESS to see if there
> are some output. However, I'm afriad that I can not get a
> serial console access to these machines (and thus no ddb
> output :( ).
>
> Thanks,
> Rong-En Fan
>
> On 3/10/06, Rong-En Fan <grafan at gmail.com> wrote:
> > Hi,
> >
> > After upgrading several our nfs clients from 5.4-RELEASE to 6.0-RELEASE
> > and some are now 6.1-PRERELEASE (a weeks ago). From time to time,
> > we saw some processes stuck in nfsaio, and unkillable. These processes
> > generate lots of traffic to nfs server (write to nfs, but nfs server's disk does
> > not really in write. from netstat, client sends ~100Mbps, on nfs server, iostat
> > does not show me ~12.5MB/s). The nfsd on the server side is either in RUN
> > or in ufs state. Server is running 5.5-PRELEASE as of yesterday.
> >
> > Client mount options: rw,nosuid,bg,intr,nodev. Both client and server
> > are running
> > rpc.lockd, rpc.statd. I'm sure it's not related to any locking problems.
> >
> > I have another set of nfs server/client both running 6.0-RELEASE. And I can
> > easily reproduce this situation on these two boxesnes, just by running
> >
> >   dd if=/dev/zero of=/nfs/ooo bs=1m
> >
> > If I do not add bs=1m, it works fine. Of all the boxes I mentioned above,
> > I did not do anything special to kernel config, i.e., they are GENERIC w/o
> > unnecessary devices and w/ firewal.  Basically, I can do anything on these
> > two boxes (they are not in production mode). Any suggestion are welcome.
> >
> > Thanks,
> > Rong-En Fan
> >
>


More information about the freebsd-stable mailing list