zfs receive stalls whole system

Rainer Duffner rainer at ultra-secure.de
Mon May 16 23:07:30 UTC 2016


Hi,

I have two servers, that were running FreeBSD 10.1-AMD64 for a long time, one zfs-sending to the other (via zxfer). Both are NFS-servers and MySQL-slaves, the sender is actively used as NFS-server, the recipient is just a warm-standby, in case something serious happens and we don’t want to wait for a day until the restore is back in place. The MySQL-Slaves are actively used as read-only servers (at the application level, Python’s SQL-Alchemy does that, apparently).

They are HP DL380G8 (one CPU, hexacore) with over 128 GB RAM (I think one has 144, the other has 192).
While they were running 10.1, they used HP P420 RAID-controllers with individual 12 RAID0 volumes that I pooled into 6-disk RAIDZ2 vdevs.
I use zfsnap to do hourly, daily and weekly snapshots.

Sending worked well, especially after updating to 10.1

Because the storage was over 90% full (and I really hate this RAID0-business we have with the HP RAID controllers), I rebuilt the servers with HPs OEMed H220/221 controllers (LSI 2308 in disguise) and an external disk shelf, hosting 12 additional disks was added- and I upgraded to FreeBSD 10.3.
Because we didn’t want to throw out the original disks, but increase available space a lot, the new disks are double the size of the original disks (600 vs. 1200 GB SAS). 
I also created GPT-partitions on the disks and labeled them according to the disk’s position in the cages/shelf, created the pools with the got-partition-names instead of the daX-names.

Now, when I do a zxfer, sometimes the whole system stalls while the data is sent over, especially if the delta is large or if something else is reading from the disk at the same time (backup agent).

I had this before, on 10.0 (I believe, we didn’t have this in 9.1 either, IIRC) and it went away in 10.1.

It’s very difficult (well, impossible) to debug, because the system totally hangs and doesn’t accept any keypresses.

Would a ZIL help in this case?
I always thought that NFS was the only thing that did SYNC writes…







More information about the freebsd-fs mailing list