Re: Speed improvements in ZFS

Reply: Alexander Leidinger : "Re: Speed improvements in ZFS"
In reply to: Alexander Leidinger : "Speed improvements in ZFS"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Mateusz Guzik <mjguzik_at_gmail.com>
Date: Tue, 15 Aug 2023 12:41:38 UTC
On 8/15/23, Alexander Leidinger <Alexander@leidinger.net> wrote:
> Hi,
>
> just a report that I noticed a very high speed improvement in ZFS in
> -current. Since a looong time (at least since last year), for a
> jail-host of mine with about >20 jails on it which each runs periodic
> daily, the periodic daily runs of the jails take from about 3 am to 5pm
> or longer. I don't remember when this started, and I thought at that
> time that the problem may be data related. It's the long runs of "find"
> in one of the periodic daily jobs which takes that long, and the number
> of jails together with null-mounted basesystem inside the jail and a
> null-mounted package repository inside each jail the number of files and
> congruent access to the spining rust with first SSD and now NVME based
> cache may have reached some tipping point. I have all the periodic daily
> mails around, so theoretically I may be able to find when this started,
> but as can be seen in another mail to this mailinglist, the system which
> has all the periodic mails has some issues which have higher priority
> for me to track down...
>
> Since I updated to a src from 2023-07-20, this is not the case anymore.
> The data is the same (maybe even a bit more, as I have added 2 more
> jails since then and the periodic daily runs which run more or less in
> parallel, are not taking considerably longer). The speed increase with
> the July-build are in the area of 3-4 hours for 23 parallel periodic
> daily runs. So instead of finishing the periodic runs around 5pm, they
> finish already around 1pm/2pm.
>
> So whatever was done inside ZFS or VFS or nullfs between 2023-06-19 and
> 2023-07-20 has given a huge speed improvement. From my memory I would
> say there is still room for improvement, as I think it may be the case
> that the periodic daily runs ended in the morning instead of the
> afteroon, but my memory may be flaky in this regard...
>
> Great work to whoever was involved.
>

several hours to run periodic is still unusably slow.

have you tried figuring out where is the time spent?

I don't know what caused the change here, but do know of one major
bottleneck which you are almost guaranteed to run into if you inspect
all files everywhere -- namely bumping over a vnode limit.

In vn_alloc_hard you can find:
                msleep(&vnlruproc_sig, &vnode_list_mtx, PVFS, "vlruwk", hz);
                if (atomic_load_long(&numvnodes) + 1 > desiredvnodes &&
                    vnlru_read_freevnodes() > 1)
                        vnlru_free_locked(1);

that is, the allocating thread will sleep up to 1 second if there are
no vnodes up for grabs and then go ahead and allocate one anyway.
Going over the numvnodes is partially rate-limited, but in a manner
which is not very usable.

The entire is mostly borked and in desperate need of a rewrite.

With this in mind can you provide: sysctl kern.maxvnodes
vfs.wantfreevnodes vfs.freevnodes vfs.vnodes_created vfs.numvnodes
vfs.recycles_free vfs.recycles

Meanwhile if there is tons of recycles, you can damage control by
bumping kern.maxvnodes.

If this is not the problem you can use dtrace to figure it out.

-- 
Mateusz Guzik <mjguzik gmail.com>