zfs receive stalls whole system

Fabian Keil freebsd-listen at fabiankeil.de
Tue May 17 08:33:28 UTC 2016


Rainer Duffner <rainer at ultra-secure.de> wrote:

> I have two servers, that were running FreeBSD 10.1-AMD64 for a long time, one zfs-sending to the other (via zxfer). Both are NFS-servers and MySQL-slaves, the sender is actively used as NFS-server, the recipient is just a warm-standby, in case something serious happens and we don’t want to wait for a day until the restore is back in place. The MySQL-Slaves are actively used as read-only servers (at the application level, Python’s SQL-Alchemy does that, apparently).
> 
> They are HP DL380G8 (one CPU, hexacore) with over 128 GB RAM (I think one has 144, the other has 192).
> While they were running 10.1, they used HP P420 RAID-controllers with individual 12 RAID0 volumes that I pooled into 6-disk RAIDZ2 vdevs.
> I use zfsnap to do hourly, daily and weekly snapshots.
[...]
> Now, when I do a zxfer, sometimes the whole system stalls while the data is sent over, especially if the delta is large or if something else is reading from the disk at the same time (backup agent).
> 
> I had this before, on 10.0 (I believe, we didn’t have this in 9.1 either, IIRC) and it went away in 10.1.

Do you use geli for swap device(s)?

> It’s very difficult (well, impossible) to debug, because the system totally hangs and doesn’t accept any keypresses.

You could try reducing ZFS's deadman timeout to get a panic.
On systems with local disks I usually use:

vfs.zfs.deadman_enabled: 1
vfs.zfs.deadman_checktime_ms: 5000
vfs.zfs.deadman_synctime_ms: 10000

Fabian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20160517/119d728e/attachment.sig>


More information about the freebsd-fs mailing list