ZFS server has gone crazy slow

Sun Apr 12 11:56:06 UTC 2020

On 12/04/2020 14:46, Peter Eriksson wrote:
> You are probably right.
> 
> However - we have seen (thru experimentation :-) that “zfs destroy -d” for
> recursive snapshot destruction on many filesystems (recursively) seemed to
> allow it to be done much faster (Ie the command finished much quicker) on our
> servers. But it also meant that a lot of I/O seemed to be happening quite
> some time after the last “zfs destroy -d” command was issued (and a really
> long time when there were near-quota-full filesystems). No clones or “user
> holds” in use here as far as I know. Why that is I don’t know. With “zfs
> destroy” (no “-d”) things seems to be much more synchronous.
> 
> We’ve stopped using “-d” now since we’d rather not have that type of I/O load
> be happening during daytime and we had some issues with some nightly snapshot
> cleanup jobs not finishing in time.

I think that you want to re-test zfs destroy vs destroy -d and do that rigorously.
I am not sure how to explain what you saw as it cannot be explained by how
destroy -d actually differs from plain destroy.  Maybe it it was a cold vs
warmed up (with respect to the ARC) system, maybe something else, but certainly
not -d if you do not have user holds and clones.

Just in case, here is the only place in the code where 'defer' actually makes
difference.

dsl_destroy_snapshot_sync_impl:
        if (defer &&
            (ds->ds_userrefs > 0 ||
            dsl_dataset_phys(ds)->ds_num_children > 1)) {
                ASSERT(spa_version(dp->dp_spa) >= SPA_VERSION_USERREFS);
                dmu_buf_will_dirty(ds->ds_dbuf, tx);
                dsl_dataset_phys(ds)->ds_flags |= DS_FLAG_DEFER_DESTROY;
                spa_history_log_internal_ds(ds, "defer_destroy", tx, "");
                return;
        }

> Anyway, the “seems to be writing out a lot of queued up ZIL data” at “zfs
> mount -a” time was definitely a real problem - it mounted most of the
> filesystems pretty quickly but then was “extremely slow” for a couple of them
> (and was causing a lot of I/O). Like 4-6 hours. Luckily that one was one of
> our backup servers and during a time when the only one it frustrated was me…
> I’d hate that to happen for one of the frontend (NFS/SMB-serving) servers
> during office hours :-)

I don't doubt that, I just tried to explain that whatever was in ZIL could not
come from zfs destroy.  It was something else.

>> On 12 Apr 2020, at 13:26, Andriy Gapon <avg at FreeBSD.org> wrote:
>> 
>> 
>> On 12/04/2020 00:24, Peter Eriksson wrote:
>>> Another fun thing that might happen is if you reboot your server and
>>> happen to have a lot of queued up writes in the ZIL (for example if you
>>> did a “zfs destroy -d -r POOL at snapshots” (deferred(background) destroys
>>> of snapshots) and do a hard reboot while it’s busy it will “write out”
>>> those queued transactions at filesystem mount time during the boot
>>> sequence
>> 
>> Just nitpicking on two bits of incorrect information here. First, zfs
>> destroy never uses ZIL.  Never.  ZIL is used only for ZPL operations like
>> file writes, renames, removes, etc.  The things that you can do with Posix 
>> system calls (~ VFS KPI).
>> 
>> Second, zfs destroy -d is not a background destroy.  It is a deferred
>> destroy. That means that either the destroy is done immediately if a
>> snapshot has no holds which means no user holds and no clones.  Or the
>> destroy is postponed until holds are gone, that is, the last clone or the
>> last user hold is removed.
>> 
>> Note, however, that unless you have a very ancient pool version destroying
>> a snapshot means that the snapshot object is removed and all blocks
>> belonging to the snapshot are queued for freeing.  Their actual freeing is
>> done asynchronously ("in background") and can be spread over multiple TXG
>> periods. That's done regardless of whether -d was used.

-- 
Andriy Gapon