Poor ZFS+NFSv3 read/write performance and panic

Mon Feb 8 14:15:32 UTC 2016

On Sun, Feb 7, 2016 at 11:58 AM, David Adam <zanchey at ucc.gu.uwa.edu.au>
wrote:

> Just wondering if anyone has any idea how to identify which devices are
> implicated in ZFS' vdev_deadman(). I have updated the firmware on the
> mps(4) card that has our disks attached but that hasn't helped.
>
> Thanks
>
> David
>
> On Fri, 29 Jan 2016, David Adam wrote:
> > We have a FreeBSD 10.2 server sharing some ZFS datasets over NFSv3. It's
> > worked well until recently, but has started to routinely perform
> > exceptionally poorly, eventually panicing in vdev_deadman() (which I
> > understand is a feature).
> >
> > Initally after booting, things are fine, but performance rapidly begins
> to
> > degrade. Both read and write performance is terrible, with many
> operations
> > either hanging indefinitely or timing out.
> >
> > When this happens, I can break into DDB and see lots of nfsd process
> stuck
> > waiting for a lock:
> > Process 784 (nfsd) thread 0xfffff80234795000 (100455)
> > shared lockmgr zfs (zfs) r = 0 (0xfffff8000b91f548) locked @
> >
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:2196
> >
> > and the backtrace looks like this:
> >   sched_switch() at sched_switch+0x495/frame 0xfffffe04677740b0
> >   mi_switch() at mi_switch+0x179/frame 0xfffffe04677740f0
> >   turnstile_wait() at turnstile_wait+0x3b2/frame 0xfffffe0467774140
> >   __mtx_lock_sleep() at __mtx_lock_sleep+0x2c0/frame 0xfffffe04677741c0
> >   __mtx_lock_flags() at __mtx_lock_flags+0x102/frame 0xfffffe0467774210
> >   vmem_size() at vmem_size+0x5a/frame 0xfffffe0467774240
> >   arc_reclaim_needed() at arc_reclaim_needed+0xd2/frame
> 0xfffffe0467774260
> >   arc_get_data_buf() at arc_get_data_buf+0x157/frame 0xfffffe04677742a0
> >   arc_read() at arc_read+0x68b/frame 0xfffffe0467774350
> >   dbuf_read() at dbuf_read+0x7ed/frame 0xfffffe04677743f0
> >   dmu_tx_check_ioerr() at dmu_tx_check_ioerr+0x8b/frame
> 0xfffffe0467774420
> >   dmu_tx_count_write() at dmu_tx_count_write+0x17e/frame
> 0xfffffe0467774540
> >   dmu_tx_hold_write() at dmu_tx_hold_write+0xba/frame 0xfffffe0467774580
> >   zfs_freebsd_write() at zfs_freebsd_write+0x55d/frame 0xfffffe04677747b0
> >   VOP_WRITE_APV() at VOP_WRITE_APV+0x193/frame 0xfffffe04677748c0
> >   nfsvno_write() at nfsvno_write+0x13e/frame 0xfffffe0467774970
> >   nfsrvd_write() at nfsrvd_write+0x496/frame 0xfffffe0467774c80
> >   nfsrvd_dorpc() at nfsrvd_dorpc+0x66b/frame 0xfffffe0467774e40
> >   nfssvc_program() at nfssvc_program+0x4e6/frame 0xfffffe0467774ff0
> >   svc_run_internal() at svc_run_internal+0xbb7/frame 0xfffffe0467775180
> >   svc_run() at svc_run+0x1db/frame 0xfffffe04677751f0
> >   nfsrvd_nfsd() at nfsrvd_nfsd+0x1f0/frame 0xfffffe0467775350
> >   nfssvc_nfsd() at nfssvc_nfsd+0x124/frame 0xfffffe0467775970
> >   sys_nfssvc() at sys_nfssvc+0xb7/frame 0xfffffe04677759a0
> >   amd64_syscall() at amd64_syscall+0x278/frame 0xfffffe0467775ab0
> >   Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0467775ab0
> >
> > Is this likely to be due to bad hardware? I can't see any problems in
> > the SMART data, and `camcontrol tags da0 -v` etc. does not reveal any
> > particularly long queues. Are there other useful things to check?
> >
> > If not, do you have any other ideas? I can make the full DDB information
> > available if that would be helpful.
> >
> > The pool is configured thus:
> >         NAME                  STATE     READ WRITE CKSUM
> >         space                 ONLINE       0     0     0
> >           mirror-0            ONLINE       0     0     0
> >             da0               ONLINE       0     0     0
> >             da1               ONLINE       0     0     0
> >           mirror-1            ONLINE       0     0     0
> >             da2               ONLINE       0     0     0
> >             da3               ONLINE       0     0     0
> >           mirror-2            ONLINE       0     0     0
> >             da4               ONLINE       0     0     0
> >             da6               ONLINE       0     0     0
> >           mirror-3            ONLINE       0     0     0
> >             da7               ONLINE       0     0     0
> >             da8               ONLINE       0     0     0
> >         logs
> >           mirror-4            ONLINE       0     0     0
> >             gpt/molmol-slog   ONLINE       0     0     0
> >             gpt/molmol-slog0  ONLINE       0     0     0
> > where the da? devices are WD Reds and the SLOG partitions are on Samsung
> > 840s.
> >
> > Many thanks,
> >
> > David Adam
> > zanchey at ucc.gu.uwa.edu.au
> >
> > _______________________________________________
> > freebsd-fs at freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
> >
>
> Cheers,
>
> David Adam
> zanchey at ucc.gu.uwa.edu.au
> Ask Me About Our SLA!
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>

I too ran into this problem and spent quite some time troubleshooting
hardware. For me it turns out it was not hardware at all, but software.
Specifically the ZFS ARC. Looking at your stack I see some arc reclaim up
top, it's possible you're running into the same issue. There is a monster
of a PR that details this here
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594

If you would like to test this theory out, the fastest way is to limit the
ARC by adding the following to /boot/loader.conf and rebooting
vfs.zfs.arc_max="24G"

Replacing 24G with what makes sense for your system, aim for 3/4 of total
memory for starters. If this solves the problem there are more scientific
methods to a permanent fix, one would be applying the patch in the PR
above, another would be a more finely tuned arc_max value.