NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE

Reed A. Cartwright cartwright at asu.edu
Tue Dec 4 17:54:47 UTC 2012


I'm having similar issues after upgrading to 9.1-RC2 and RC3.  I'm not
using either NFS or a ZIL.

On Tue, Dec 4, 2012 at 7:26 AM, Rick Macklem <rmacklem at uoguelph.ca> wrote:
> Olivier wrote:
>> Hi all
>> After upgrading from 9.0-RELEASE to 9.1-PRERELEASE #0 r243679 I'm
>> having
>> severe problems with NFS sharing of a ZFS volume. nfsd appears to hang
>> at
>> random times (between once every couple hours to once every two days)
>> while
>> accessing a ZFS volume, and the only way I have found of resolving the
>> problem is to reboot. The server console is sometimes still responsive
>> during the nfsd hang, and I can read and write files to the same ZFS
>> volume
>> while nfsd is hung. I am pasting below the output of procstat -kk on
>> nfsd,
>> and details of my pool (nfsstat on the server gets hung when the
>> problem
>> has started occurring, and does not produce any output). The pool is
>> v28
>> and was created from a bunch of volumes attached over Fibre Channel
>> using
>> the mpt driver. My system has a Supermicro board and 4 AMD Opteron
>> 6274
>> CPUs.
>>
>> I did not experience any nfsd hangs with 9.0-RELEASE (same machine,
>> essentially same configuration, same usage pattern).
>>
>> I would greatly appreciate any help to resolve this problem!
>> Thank you
>> Olivier
>>
>> PID TID COMM TDNAME KSTACK
>> 1511 102751 nfsd nfsd: master
>> mi_switch+0x186
>> sleepq_wait+0x42
>> __lockmgr_args+0x5ae
>> vop_stdlock+0x39
>> VOP_LOCK1_APV+0x46
>> _vn_lock+0x47
>> zfs_fhtovp+0x338
>> nfsvno_fhtovp+0x87
>> nfsd_fhtovp+0x7a
>> nfsrvd_dorpc+0x9cf
>> nfssvc_program+0x447
>> svc_run_internal+0x687
>> svc_run+0x8f
>> nfsrvd_nfsd+0x193
>> nfssvc_nfsd+0x9b
>> sys_nfssvc+0x90
>> amd64_syscall+0x540
>> Xfast_syscall+0xf7
>> 1511 102752 nfsd nfsd: service
>> mi_switch+0x186
>> sleepq_wait+0x42
>> __lockmgr_args+0x5ae
>> vop_stdlock+0x39
>> VOP_LOCK1_APV+0x46
>> _vn_lock+0x47
>> zfs_fhtovp+0x338
>> nfsvno_fhtovp+0x87
>> nfsd_fhtovp+0x7a
>> nfsrvd_dorpc+0x9cf
>> nfssvc_program+0x447
>> svc_run_internal+0x687
>> svc_thread_start+0xb
>> fork_exit+0x11f
>> fork_trampoline+0xe
>> 1511 102753 nfsd nfsd: service
>> mi_switch+0x186
>> sleepq_wait+0x42
>> _cv_wait+0x112
>> zio_wait+0x61
>> zil_commit+0x764
>> zfs_freebsd_write+0xba0
>> VOP_WRITE_APV+0xb2
>> nfsvno_write+0x14d
>> nfsrvd_write+0x362
>> nfsrvd_dorpc+0x3c0
>> nfssvc_program+0x447
>> svc_run_internal+0x687
>> svc_thread_start+0xb
>> fork_exit+0x11f
>> fork_trampoline+0xe
>> 1511 102754 nfsd nfsd: service
>> mi_switch+0x186
>> sleepq_wait+0x42
>> _cv_wait+0x112
>> zio_wait+0x61
>> zil_commit+0x3cf
>> zfs_freebsd_fsync+0xdc
>> nfsvno_fsync+0x2f2
>> nfsrvd_commit+0xe7
>> nfsrvd_dorpc+0x3c0
>> nfssvc_program+0x447
>> svc_run_internal+0x687
>> svc_thread_start+0xb
>> fork_exit+0x11f
>> fork_trampoline+0xe
>> 1511 102755 nfsd nfsd: service
>> mi_switch+0x186
>> sleepq_wait+0x42
>> __lockmgr_args+0x5ae
>> vop_stdlock+0x39
>> VOP_LOCK1_APV+0x46
>> _vn_lock+0x47
>> zfs_fhtovp+0x338
>> nfsvno_fhtovp+0x87
>> nfsd_fhtovp+0x7a
>> nfsrvd_dorpc+0x9cf
>> nfssvc_program+0x447
>> svc_run_internal+0x687
>> svc_thread_start+0xb
>> fork_exit+0x11f
>> fork_trampoline+0xe
>> 1511 102756 nfsd nfsd: service
>> mi_switch+0x186
>> sleepq_wait+0x42
>> _cv_wait+0x112
>> zil_commit+0x6d
>> zfs_freebsd_write+0xba0
>> VOP_WRITE_APV+0xb2
>> nfsvno_write+0x14d
>> nfsrvd_write+0x362
>> nfsrvd_dorpc+0x3c0
>> nfssvc_program+0x447
>> svc_run_internal+0x687
>> svc_thread_start+0xb
>> fork_exit+0x11f
>> fork_trampoline+0xe
>>
> These threads are either waiting for a vnode lock or waiting inside
> zil_commit() { at 3 different locations in zil_commit() }. A guess
> would be that the ZIL hasn`t completed a write for some reason, so
> 3 threads are waiting for it when one of them is holding a lock on
> the vnode being written and the remaining threads are waiting for
> that vnode lock.
>
> I am not a ZFS guy, so I cannot help further, except to suggest
> that you try and determine what might cause a write to the ZIL to
> stall. (Different device, different device driver...)
>
> Good luck with it, rick
>
>>
>> PID TID COMM TDNAME KSTACK
>> 1507 102750 nfsd -
>> mi_switch+0x186
>> sleepq_catch_signals+0x2e1
>> sleepq_wait_sig+0x16
>> _cv_wait_sig+0x12a
>> seltdwait+0xf6
>> kern_select+0x6ef
>> sys_select+0x5d
>> amd64_syscall+0x540
>> Xfast_syscall+0xf7
>>
>>
>> pool: tank
>> state: ONLINE
>> status: The pool is formatted using a legacy on-disk format. The pool
>> can
>> still be used, but some features are unavailable.
>> action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
>> pool will no longer be accessible on software that does not support
>> feature
>> flags.
>> scan: scrub repaired 0 in 45h37m with 0 errors on Mon Dec 3 03:07:11
>> 2012
>> config:
>>
>> NAME STATE READ WRITE CKSUM
>> tank ONLINE 0 0 0
>> raidz1-0 ONLINE 0 0 0
>> da19 ONLINE 0 0 0
>> da31 ONLINE 0 0 0
>> da32 ONLINE 0 0 0
>> da33 ONLINE 0 0 0
>> da34 ONLINE 0 0 0
>> raidz1-1 ONLINE 0 0 0
>> da20 ONLINE 0 0 0
>> da36 ONLINE 0 0 0
>> da37 ONLINE 0 0 0
>> da38 ONLINE 0 0 0
>> da39 ONLINE 0 0 0
>> _______________________________________________
>> freebsd-stable at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to
>> "freebsd-stable-unsubscribe at freebsd.org"
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"



-- 
Reed A. Cartwright, PhD
Assistant Professor of Genomics, Evolution, and Bioinformatics
School of Life Sciences
Center for Evolutionary Medicine and Informatics
The Biodesign Institute
Arizona State University


More information about the freebsd-fs mailing list