NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE

Tue Dec 4 17:54:47 UTC 2012

I'm having similar issues after upgrading to 9.1-RC2 and RC3.  I'm not
using either NFS or a ZIL.

On Tue, Dec 4, 2012 at 7:26 AM, Rick Macklem <rmacklem at uoguelph.ca> wrote:
> Olivier wrote:
>> Hi all
>> After upgrading from 9.0-RELEASE to 9.1-PRERELEASE #0 r243679 I'm
>> having
>> severe problems with NFS sharing of a ZFS volume. nfsd appears to hang
>> at
>> random times (between once every couple hours to once every two days)
>> while
>> accessing a ZFS volume, and the only way I have found of resolving the
>> problem is to reboot. The server console is sometimes still responsive
>> during the nfsd hang, and I can read and write files to the same ZFS
>> volume
>> while nfsd is hung. I am pasting below the output of procstat -kk on
>> nfsd,
>> and details of my pool (nfsstat on the server gets hung when the
>> problem
>> has started occurring, and does not produce any output). The pool is
>> v28
>> and was created from a bunch of volumes attached over Fibre Channel
>> using
>> the mpt driver. My system has a Supermicro board and 4 AMD Opteron
>> 6274
>> CPUs.
>>
>> I did not experience any nfsd hangs with 9.0-RELEASE (same machine,
>> essentially same configuration, same usage pattern).
>>
>> I would greatly appreciate any help to resolve this problem!
>> Thank you
>> Olivier
>>
>> PID TID COMM TDNAME KSTACK
>> 1511 102751 nfsd nfsd: master
>> mi_switch+0x186
>> sleepq_wait+0x42
>> __lockmgr_args+0x5ae
>> vop_stdlock+0x39
>> VOP_LOCK1_APV+0x46
>> _vn_lock+0x47
>> zfs_fhtovp+0x338
>> nfsvno_fhtovp+0x87
>> nfsd_fhtovp+0x7a
>> nfsrvd_dorpc+0x9cf
>> nfssvc_program+0x447
>> svc_run_internal+0x687
>> svc_run+0x8f
>> nfsrvd_nfsd+0x193
>> nfssvc_nfsd+0x9b
>> sys_nfssvc+0x90
>> amd64_syscall+0x540
>> Xfast_syscall+0xf7
>> 1511 102752 nfsd nfsd: service
>> mi_switch+0x186
>> sleepq_wait+0x42
>> __lockmgr_args+0x5ae
>> vop_stdlock+0x39
>> VOP_LOCK1_APV+0x46
>> _vn_lock+0x47
>> zfs_fhtovp+0x338
>> nfsvno_fhtovp+0x87
>> nfsd_fhtovp+0x7a
>> nfsrvd_dorpc+0x9cf
>> nfssvc_program+0x447
>> svc_run_internal+0x687
>> svc_thread_start+0xb
>> fork_exit+0x11f
>> fork_trampoline+0xe
>> 1511 102753 nfsd nfsd: service
>> mi_switch+0x186
>> sleepq_wait+0x42
>> _cv_wait+0x112
>> zio_wait+0x61
>> zil_commit+0x764
>> zfs_freebsd_write+0xba0
>> VOP_WRITE_APV+0xb2
>> nfsvno_write+0x14d
>> nfsrvd_write+0x362
>> nfsrvd_dorpc+0x3c0
>> nfssvc_program+0x447
>> svc_run_internal+0x687
>> svc_thread_start+0xb
>> fork_exit+0x11f
>> fork_trampoline+0xe
>> 1511 102754 nfsd nfsd: service
>> mi_switch+0x186
>> sleepq_wait+0x42
>> _cv_wait+0x112
>> zio_wait+0x61
>> zil_commit+0x3cf
>> zfs_freebsd_fsync+0xdc
>> nfsvno_fsync+0x2f2
>> nfsrvd_commit+0xe7
>> nfsrvd_dorpc+0x3c0
>> nfssvc_program+0x447
>> svc_run_internal+0x687
>> svc_thread_start+0xb
>> fork_exit+0x11f
>> fork_trampoline+0xe
>> 1511 102755 nfsd nfsd: service
>> mi_switch+0x186
>> sleepq_wait+0x42
>> __lockmgr_args+0x5ae
>> vop_stdlock+0x39
>> VOP_LOCK1_APV+0x46
>> _vn_lock+0x47
>> zfs_fhtovp+0x338
>> nfsvno_fhtovp+0x87
>> nfsd_fhtovp+0x7a
>> nfsrvd_dorpc+0x9cf
>> nfssvc_program+0x447
>> svc_run_internal+0x687
>> svc_thread_start+0xb
>> fork_exit+0x11f
>> fork_trampoline+0xe
>> 1511 102756 nfsd nfsd: service
>> mi_switch+0x186
>> sleepq_wait+0x42
>> _cv_wait+0x112
>> zil_commit+0x6d
>> zfs_freebsd_write+0xba0
>> VOP_WRITE_APV+0xb2
>> nfsvno_write+0x14d
>> nfsrvd_write+0x362
>> nfsrvd_dorpc+0x3c0
>> nfssvc_program+0x447
>> svc_run_internal+0x687
>> svc_thread_start+0xb
>> fork_exit+0x11f
>> fork_trampoline+0xe
>>
> These threads are either waiting for a vnode lock or waiting inside
> zil_commit() { at 3 different locations in zil_commit() }. A guess
> would be that the ZIL hasn`t completed a write for some reason, so
> 3 threads are waiting for it when one of them is holding a lock on
> the vnode being written and the remaining threads are waiting for
> that vnode lock.
>
> I am not a ZFS guy, so I cannot help further, except to suggest
> that you try and determine what might cause a write to the ZIL to
> stall. (Different device, different device driver...)
>
> Good luck with it, rick
>
>>
>> PID TID COMM TDNAME KSTACK
>> 1507 102750 nfsd -
>> mi_switch+0x186
>> sleepq_catch_signals+0x2e1
>> sleepq_wait_sig+0x16
>> _cv_wait_sig+0x12a
>> seltdwait+0xf6
>> kern_select+0x6ef
>> sys_select+0x5d
>> amd64_syscall+0x540
>> Xfast_syscall+0xf7
>>
>>
>> pool: tank
>> state: ONLINE
>> status: The pool is formatted using a legacy on-disk format. The pool
>> can
>> still be used, but some features are unavailable.
>> action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
>> pool will no longer be accessible on software that does not support
>> feature
>> flags.
>> scan: scrub repaired 0 in 45h37m with 0 errors on Mon Dec 3 03:07:11
>> 2012
>> config:
>>
>> NAME STATE READ WRITE CKSUM
>> tank ONLINE 0 0 0
>> raidz1-0 ONLINE 0 0 0
>> da19 ONLINE 0 0 0
>> da31 ONLINE 0 0 0
>> da32 ONLINE 0 0 0
>> da33 ONLINE 0 0 0
>> da34 ONLINE 0 0 0
>> raidz1-1 ONLINE 0 0 0
>> da20 ONLINE 0 0 0
>> da36 ONLINE 0 0 0
>> da37 ONLINE 0 0 0
>> da38 ONLINE 0 0 0
>> da39 ONLINE 0 0 0
>> _______________________________________________
>> freebsd-stable at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to
>> "freebsd-stable-unsubscribe at freebsd.org"
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"

-- 
Reed A. Cartwright, PhD
Assistant Professor of Genomics, Evolution, and Bioinformatics
School of Life Sciences
Center for Evolutionary Medicine and Informatics
The Biodesign Institute
Arizona State University