FreeBSD NFS client/Linux NFS server issue

Tue Jan 19 08:03:02 UTC 2010

Hi,

In this thread I have posted to freebsd-fs@ several messages describing our
problem with freebsd7.1 nfs clients. As with the time new info has appeared
and having this all spread in several messages might be a bit confusing, I
want to summarise here what we see and know.

Also I cc to freebsd-stable@ hoping to draw more attention to this problem as
it looks for me very interesting and challenging :-)

I have found in the Internet that other people have been observed the similar
problem with FreeBSD6.2 client:

http://forums.freebsd.org/showthread.php?t=1697

So, on some of our freebsd7.1 nfs clients (and it looks like we have had
similar case with 6.3), which have several nfs mounts to the same CentOS 5.3
NFS server (mount options: rw,-3,-T,-s,-i,-r=32768,-w=32768,-o=noinet6), at
some moment the access to one of the NFS mount gets stuck, while the access to
the other mounts works ok.

In all cases we have been observed so far the first gotten stuck process was
php script (or two) that was (were) writing to logs file (appending). In
tcpdump we see that every write to the file causes the sequence of the
following rpc: ACCESS - READ - WRITE - COMMIT. And at some moment this stops
after READ rpc call and successful reply.

After this in tcpdump successful readdir/access/lookup/fstat calls are
observed from our other utilities, which just check the presence of some files
and they work ok (df also works). The php process at this state is in bo_wwait
invalidating buffer cache [1].

If at this time we try accessing the share with mc then it hangs acquiring the
vn_lock held by php process [2] and after this any operations with this NFS
share hang (df hangs too).

If instead some other process is started that writes to some other file on
this share (append) then the first process "unfreezes" too (starting from
WRITE rpc, so there is no any retransmits).

With my limited knowledge of this complicated kernel subsystem I have the
following hypothesis what is going on. On some of the nfs_write() it does
successful ACCESS - READ rpcs but by some reason does not call WRITE to flush
dirty buffer to the server (aborts somewere or may be in bdwrite() which calls
bd_wakeup() and actually bd_wakeup considers that we don't have enough dirty
buffers?). But it looks like on this stage the buffer appears to be unlinked
from bufqueues [3] so when bufdaemon runs it does not flush the buffer. The
next write() call to this file causes the process to get stuck invalidating
the dirty buffer. The buffer is accessible by nfsiod via nmp structure [3] and
when the next process is writing to another file, nfsiod is started and
flushes this dirty buffer.

[1]: Gotten stuck php process:

(kgdb) bt
#0  sched_switch (td=0xc839e000, newtd=Variable "newtd" is not available.
) at /usr/src/sys/kern/sched_ule.c:1944
#1  0xc07cabe6 in mi_switch (flags=Variable "flags" is not available.
) at /usr/src/sys/kern/kern_synch.c:440
#2  0xc07f42fb in sleepq_switch (wchan=Variable "wchan" is not available.
) at /usr/src/sys/kern/subr_sleepqueue.c:497
#3  0xc07f460c in sleepq_catch_signals (wchan=0xc90c9ee8) at /usr/src/sys/kern/subr_sleepqueue.c:417
#4  0xc07f4ebd in sleepq_wait_sig (wchan=0xc90c9ee8) at /usr/src/sys/kern/subr_sleepqueue.c:594
#5  0xc07cb047 in _sleep (ident=0xc90c9ee8, lock=0xc90c9e8c, priority=333, wmesg=0xc0b731ed "bo_wwait", 
    timo=0) at /usr/src/sys/kern/kern_synch.c:224
#6  0xc0827295 in bufobj_wwait (bo=0xc90c9ec4, slpflag=256, timeo=0) at /usr/src/sys/kern/vfs_bio.c:3870
#7  0xc0966307 in nfs_flush (vp=0xc90c9e04, waitfor=1, td=0xc839e000, commit=1)
    at /usr/src/sys/nfsclient/nfs_vnops.c:2989
#8  0xc09667ce in nfs_fsync (ap=0xed3c38ec) at /usr/src/sys/nfsclient/nfs_vnops.c:2725
#9  0xc0aee5d2 in VOP_FSYNC_APV (vop=0xc0c2b920, a=0xed3c38ec) at vnode_if.c:1007
#10 0xc0827864 in bufsync (bo=0xc90c9ec4, waitfor=1, td=0xc839e000) at vnode_if.h:538
#11 0xc083f354 in bufobj_invalbuf (bo=0xc90c9ec4, flags=1, td=0xc839e000, slpflag=256, slptimeo=0)
    at /usr/src/sys/kern/vfs_subr.c:1066
#12 0xc083f6e2 in vinvalbuf (vp=0xc90c9e04, flags=1, td=0xc839e000, slpflag=256, slptimeo=0)
    at /usr/src/sys/kern/vfs_subr.c:1142
#13 0xc094f216 in nfs_vinvalbuf (vp=0xc90c9e04, flags=Variable "flags" is not available.
) at /usr/src/sys/nfsclient/nfs_bio.c:1326
#14 0xc0951825 in nfs_write (ap=0xed3c3bc4) at /usr/src/sys/nfsclient/nfs_bio.c:918
#15 0xc0aef956 in VOP_WRITE_APV (vop=0xc0c2b920, a=0xed3c3bc4) at vnode_if.c:691
#16 0xc0850097 in vn_write (fp=0xc9969b48, uio=0xed3c3c60, active_cred=0xcb901600, flags=0, 
    td=0xc839e000) at vnode_if.h:373
#17 0xc07f9d17 in dofilewrite (td=0xc839e000, fd=6, fp=0xc9969b48, auio=0xed3c3c60, offset=-1, flags=0)
    at file.h:256
#18 0xc07f9ff8 in kern_writev (td=0xc839e000, fd=6, auio=0xed3c3c60)
    at /usr/src/sys/kern/sys_generic.c:401
#19 0xc07fa06f in write (td=0xc839e000, uap=0xed3c3cfc) at /usr/src/sys/kern/sys_generic.c:317
#20 0xc0ad9c75 in syscall (frame=0xed3c3d38) at /usr/src/sys/i386/i386/trap.c:1090
#21 0xc0ac01b0 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:255
#22 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)

[2] mc process gotten stuck acquiring _vn_lock held by above php thread:

    at /usr/src/sys/kern/sched_ule.c:1944
* 178 Thread 100340 (PID=40443: mc)  sched_switch (td=0xc9810af0, newtd=Variable "newtd" is not availabl .

(kgdb) thr 178
[Switching to thread 178 (Thread 100340)]#0  sched_switch (td=0xc9810af0, newtd=Variable "newtd" is not available.
)
    at /usr/src/sys/kern/sched_ule.c:1944
1944                    cpuid = PCPU_GET(cpuid);
(kgdb) bt
#0  sched_switch (td=0xc9810af0, newtd=Variable "newtd" is not available.
) at /usr/src/sys/kern/sched_ule.c:1944
#1  0xc07cabe6 in mi_switch (flags=Variable "flags" is not available.
) at /usr/src/sys/kern/kern_synch.c:440
#2  0xc07f42fb in sleepq_switch (wchan=Variable "wchan" is not available.
) at /usr/src/sys/kern/subr_sleepqueue.c:497
#3  0xc07f4946 in sleepq_wait (wchan=0xc90c9e5c) at /usr/src/sys/kern/subr_sleepqueue.c:580
#4  0xc07cb056 in _sleep (ident=0xc90c9e5c, lock=0xc0c77d18, priority=80, wmesg=0xc0b80b92 "nfs", 
    timo=0) at /usr/src/sys/kern/kern_synch.c:226
#5  0xc07adf5a in acquire (lkpp=0xed56b7f0, extflags=Variable "extflags" is not available.
) at /usr/src/sys/kern/kern_lock.c:151
#6  0xc07ae84c in _lockmgr (lkp=0xc90c9e5c, flags=8194, interlkp=0xc90c9e8c, td=0xc9810af0, 
    file=0xc0b74aeb "/usr/src/sys/kern/vfs_subr.c", line=2061) at /usr/src/sys/kern/kern_lock.c:384
#7  0xc0832470 in vop_stdlock (ap=0xed56b840) at /usr/src/sys/kern/vfs_default.c:305
#8  0xc0aef4f6 in VOP_LOCK1_APV (vop=0xc0c1d5c0, a=0xed56b840) at vnode_if.c:1618
#9  0xc084ed86 in _vn_lock (vp=0xc90c9e04, flags=8194, td=0xc9810af0, 
    file=0xc0b74aeb "/usr/src/sys/kern/vfs_subr.c", line=2061) at vnode_if.h:851
#10 0xc0841d84 in vget (vp=0xc90c9e04, flags=8194, td=0xc9810af0) at /usr/src/sys/kern/vfs_subr.c:2061
#11 0xc08355b3 in vfs_hash_get (mp=0xc6b472cc, hash=3326873010, flags=Variable "flags" is not available.
)
    at /usr/src/sys/kern/vfs_hash.c:81
#12 0xc09534d4 in nfs_nget (mntp=0xc6b472cc, fhp=0xc97be078, fhsize=20, npp=0xed56b9f0, flags=2)
    at /usr/src/sys/nfsclient/nfs_node.c:120
#13 0xc0964a05 in nfs_lookup (ap=0xed56ba84) at /usr/src/sys/nfsclient/nfs_vnops.c:947
#14 0xc0aefbe6 in VOP_LOOKUP_APV (vop=0xc0c2b920, a=0xed56ba84) at vnode_if.c:99
#15 0xc0836841 in lookup (ndp=0xed56bb48) at vnode_if.h:57
#16 0xc083756f in namei (ndp=0xed56bb48) at /usr/src/sys/kern/vfs_lookup.c:219
#17 0xc0844fef in kern_lstat (td=0xc9810af0, path=0x48611280 <Address 0x48611280 out of bounds>, 
    pathseg=UIO_USERSPACE, sbp=0xed56bc18) at /usr/src/sys/kern/vfs_syscalls.c:2169
#18 0xc08451af in lstat (td=0xc9810af0, uap=0xed56bcfc) at /usr/src/sys/kern/vfs_syscalls.c:2152
#19 0xc0ad9c75 in syscall (frame=0xed56bd38) at /usr/src/sys/i386/i386/trap.c:1090
#20 0xc0ac01b0 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:255
#21 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) fr 6
#6  0xc07ae84c in _lockmgr (lkp=0xc90c9e5c, flags=8194, interlkp=0xc90c9e8c, td=0xc9810af0, 
    file=0xc0b74aeb "/usr/src/sys/kern/vfs_subr.c", line=2061) at /usr/src/sys/kern/kern_lock.c:384
384                     error = acquire(&lkp, extflags, (LK_HAVE_EXCL | LK_WANT_EXCL), &contested, &waitstart);
(kgdb) p *lkp
$2 = {lk_object = {lo_name = 0xc0b80b92 "nfs", lo_type = 0xc0b80b92 "nfs", lo_flags = 70844416, 
    lo_witness_data = {lod_list = {stqe_next = 0x0}, lod_witness = 0x0}}, lk_interlock = 0xc0c77d18, 
  lk_flags = 33816640, lk_sharecount = 0, lk_waitcount = 1, lk_exclusivecount = 1, lk_prio = 80, 
  lk_timo = 51, lk_lockholder = 0xc839e000, lk_newlock = 0x0}

[3] struct nfsmount of the "problem" share:

(kgdb) p *nmp
$4 = {nm_mtx = {lock_object = {lo_name = 0xc0b808ee "NFSmount lock", 
      lo_type = 0xc0b808ee "NFSmount lock", lo_flags = 16973824, lo_witness_data = {lod_list = {
          stqe_next = 0x0}, lod_witness = 0x0}}, mtx_lock = 4, mtx_recurse = 0}, nm_flag = 35399, 
  nm_state = 1310720, nm_mountp = 0xc6b472cc, nm_numgrps = 16, 
  nm_fh = "\001\000\000\000\000\223\000\000\001@\003\n", '\0' <repeats 115 times>, nm_fhsize = 12, 
  nm_rpcclnt = {rc_flag = 0, rc_wsize = 0, rc_rsize = 0, rc_name = 0x0, rc_so = 0x0, rc_sotype = 0, 
    rc_soproto = 0, rc_soflags = 0, rc_timeo = 0, rc_retry = 0, rc_srtt = {0, 0, 0, 0}, rc_sdrtt = {0, 
      0, 0, 0}, rc_sent = 0, rc_cwnd = 0, rc_timeouts = 0, rc_deadthresh = 0, rc_authtype = 0, 
    rc_auth = 0x0, rc_prog = 0x0, rc_proctlen = 0, rc_proct = 0x0}, nm_so = 0xc6e81d00, nm_sotype = 1, 
  nm_soproto = 0, nm_soflags = 44, nm_nam = 0xc6948640, nm_timeo = 6000, nm_retry = 2, nm_srtt = {15, 
    15, 31, 52}, nm_sdrtt = {3, 3, 15, 15}, nm_sent = 0, nm_cwnd = 4096, nm_timeouts = 0, 
  nm_deadthresh = 9, nm_rsize = 32768, nm_wsize = 32768, nm_readdirsize = 4096, nm_readahead = 1, 
  nm_wcommitsize = 1177026, nm_acdirmin = 30, nm_acdirmax = 60, nm_acregmin = 3, nm_acregmax = 60, 
  nm_verf = "JК╬W\000\004oМ", nm_bufq = {tqh_first = 0xda82dc70, tqh_last = 0xda8058e0}, 
  nm_bufqlen = 2, nm_bufqwant = 0, nm_bufqiods = 1, nm_maxfilesize = 1099511627775, 
  nm_rpcops = 0xc0c2b5bc, nm_tprintf_initial_delay = 12, nm_tprintf_delay = 30, nm_nfstcpstate = {
    rpcresid = 0, flags = 1, sock_send_inprog = 0}, 
  nm_hostname = "172.30.10.92\000/var/www/app31", '\0' <repeats 60 times>, nm_clientid = 0, nm_fsid = {
    val = {0, 0}}, nm_lease_time = 0, nm_last_renewal = 0}

buffers on it:

(kgdb) p *nmp->nm_bufq.tqh_first
$7 = {b_bufobj = 0xc7324960, b_bcount = 31565, b_caller1 = 0x0, 
  b_data = 0xde581000 " valid_lines:", ' ' <repeats 29 times>, "1341\n          invalid_lines:", ' ' <repeats 27 times>, "1556\n          total_lines:", ' ' <repeats 29 times>, "2897\n\n    Error summary:\n          Inactive pr"..., b_error = 0, b_iocmd = 2 '\002', b_ioflags = 0 '\0', b_iooffset = 196608, 
  b_resid = 0, b_iodone = 0, b_blkno = 384, b_offset = 196608, b_bobufs = {tqe_next = 0x0, 
    tqe_prev = 0xc7324964}, b_left = 0x0, b_right = 0x0, b_vflags = 0, b_freelist = {
    tqe_next = 0xda805894, tqe_prev = 0xc725d3c0}, b_qindex = 0, b_flags = 536870948, 
  b_xflags = 2 '\002', b_lock = {lk_object = {lo_name = 0xc0b73635 "bufwait", 
      lo_type = 0xc0b73635 "bufwait", lo_flags = 70844416, lo_witness_data = {lod_list = {
          stqe_next = 0x0}, lod_witness = 0x0}}, lk_interlock = 0xc0c77b50, lk_flags = 262144, 
    lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 1, lk_prio = 80, lk_timo = 0, 
    lk_lockholder = 0xfffffffe, lk_newlock = 0x0}, b_bufsize = 31744, b_runningbufspace = 0, 
  b_kvabase = 0xde581000 " valid_lines:", ' ' <repeats 29 times>, "1341\n          invalid_lines:", ' ' <repeats 27 times>, "1556\n          total_lines:", ' ' <repeats 29 times>, "2897\n\n    Error summary:\n          Inactive pr"..., b_kvasize = 32768, b_lblkno = 6, b_vp = 0xc73248a0, b_dirtyoff = 31512, 
  b_dirtyend = 31565, b_rcred = 0x0, b_wcred = 0xcebec400, b_saveaddr = 0xde581000, b_pager = {
    pg_reqpage = 0}, b_cluster = {cluster_head = {tqh_first = 0xda917ec8, tqh_last = 0xda888e94}, 
    cluster_entry = {tqe_next = 0xda917ec8, tqe_prev = 0xda888e94}}, b_pages = {0xc3726e90, 0xc448dca8, 
    0xc2a55b98, 0xc3bf1a28, 0xc3467ff0, 0xc3299600, 0xc28db130, 0xc2301398, 0x0 <repeats 24 times>}, 
  b_npages = 8, b_dep = {lh_first = 0x0}, b_fsprivate1 = 0x0, b_fsprivate2 = 0x0, b_fsprivate3 = 0x0, 
  b_pin_count = 0}

These are entires from our log file. Note that b_qindex is 0. But
bufqueues[0] is empty:

(kgdb) p bufqueues[0]
$8 = {tqh_first = 0x0, tqh_last = 0xc0c83e20}

Also does not it look strange that lk_lockholder of b_lock points to
innvalid location (0xfffffffe)?

-- 
Mikolaj Golub