ZFS server has gone crazy slow

Peter Eriksson pen at lysator.liu.se
Sun Apr 12 11:46:33 UTC 2020


You are probably right. 

However - we have seen (thru experimentation :-) that “zfs destroy -d” for recursive snapshot destruction on many filesystems (recursively) seemed to allow it to be done much faster (Ie the command finished much quicker) on our servers. But it also meant that a lot of I/O seemed to be happening quite some time after the last “zfs destroy -d” command was issued (and a really long time when there were near-quota-full filesystems). No clones or “user holds” in use here as far as I know. Why that is I don’t know. With “zfs destroy” (no “-d”) things seems to be much more synchronous.

We’ve stopped using “-d” now since we’d rather not have that type of I/O load be happening during daytime and we had some issues with some nightly snapshot cleanup jobs not finishing in time.

Anyway, the “seems to be writing out a lot of queued up ZIL data” at “zfs mount -a” time was definitely a real problem - it mounted most of the filesystems pretty quickly but then was “extremely slow” for a couple of them (and was causing a lot of I/O). Like 4-6 hours. Luckily that one was one of our backup servers and during a time when the only one it frustrated was me… I’d hate that to happen for one of the frontend (NFS/SMB-serving) servers during office hours :-)

- Peter


> On 12 Apr 2020, at 13:26, Andriy Gapon <avg at FreeBSD.org> wrote:
> 
> 
> On 12/04/2020 00:24, Peter Eriksson wrote:
>> Another fun thing that might happen is if you reboot your server and happen
>> to have a lot of queued up writes in the ZIL (for example if you did a “zfs
>> destroy -d -r POOL at snapshots” (deferred(background) destroys of snapshots)
>> and do a hard reboot while it’s busy it will “write out” those queued
>> transactions at filesystem mount time during the boot sequence
> 
> Just nitpicking on two bits of incorrect information here.
> First, zfs destroy never uses ZIL.  Never.  ZIL is used only for ZPL operations
> like file writes, renames, removes, etc.  The things that you can do with Posix
> system calls (~ VFS KPI).
> 
> Second, zfs destroy -d is not a background destroy.  It is a deferred destroy.
> That means that either the destroy is done immediately if a snapshot has no
> holds which means no user holds and no clones.  Or the destroy is postponed
> until holds are gone, that is, the last clone or the last user hold is removed.
> 
> Note, however, that unless you have a very ancient pool version destroying a
> snapshot means that the snapshot object is removed and all blocks belonging to
> the snapshot are queued for freeing.  Their actual freeing is done
> asynchronously ("in background") and can be spread over multiple TXG periods.
> That's done regardless of whether -d was used.
> 
> -- 
> Andriy Gapon



More information about the freebsd-fs mailing list