zfs receive stalls whole system
rainer at ultra-secure.de
rainer at ultra-secure.de
Tue May 17 09:08:21 UTC 2016
Am 2016-05-17 10:27, schrieb Fabian Keil:
> Rainer Duffner <rainer at ultra-secure.de> wrote:
>
>> I have two servers, that were running FreeBSD 10.1-AMD64 for a long
>> time, one zfs-sending to the other (via zxfer). Both are NFS-servers
>> and MySQL-slaves, the sender is actively used as NFS-server, the
>> recipient is just a warm-standby, in case something serious happens
>> and we don’t want to wait for a day until the restore is back in
>> place. The MySQL-Slaves are actively used as read-only servers (at the
>> application level, Python’s SQL-Alchemy does that, apparently).
>>
>> They are HP DL380G8 (one CPU, hexacore) with over 128 GB RAM (I think
>> one has 144, the other has 192).
>> While they were running 10.1, they used HP P420 RAID-controllers with
>> individual 12 RAID0 volumes that I pooled into 6-disk RAIDZ2 vdevs.
>> I use zfsnap to do hourly, daily and weekly snapshots.
> [...]
>> Now, when I do a zxfer, sometimes the whole system stalls while the
>> data is sent over, especially if the delta is large or if something
>> else is reading from the disk at the same time (backup agent).
>>
>> I had this before, on 10.0 (I believe, we didn’t have this in 9.1
>> either, IIRC) and it went away in 10.1.
>
> Do you use geli for swap device(s)?
Yes, I do.
/dev/mirror/swap.eli none swap sw 0 0
Bad idea?
>> It’s very difficult (well, impossible) to debug, because the system
>> totally hangs and doesn’t accept any keypresses.
>
> You could try reducing ZFS's deadman timeout to get a panic.
> On systems with local disks I usually use:
>
> vfs.zfs.deadman_enabled: 1
> vfs.zfs.deadman_checktime_ms: 5000
> vfs.zfs.deadman_synctime_ms: 10000
Too bad I don't have a spare-system I could use to test this ;-)
More information about the freebsd-fs
mailing list