Re: Speed improvements in ZFS
- Reply: Alexander Leidinger : "Re: Speed improvements in ZFS"
- In reply to: Alexander Leidinger : "Speed improvements in ZFS"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 15 Aug 2023 12:41:38 UTC
On 8/15/23, Alexander Leidinger <Alexander@leidinger.net> wrote:
> Hi,
>
> just a report that I noticed a very high speed improvement in ZFS in
> -current. Since a looong time (at least since last year), for a
> jail-host of mine with about >20 jails on it which each runs periodic
> daily, the periodic daily runs of the jails take from about 3 am to 5pm
> or longer. I don't remember when this started, and I thought at that
> time that the problem may be data related. It's the long runs of "find"
> in one of the periodic daily jobs which takes that long, and the number
> of jails together with null-mounted basesystem inside the jail and a
> null-mounted package repository inside each jail the number of files and
> congruent access to the spining rust with first SSD and now NVME based
> cache may have reached some tipping point. I have all the periodic daily
> mails around, so theoretically I may be able to find when this started,
> but as can be seen in another mail to this mailinglist, the system which
> has all the periodic mails has some issues which have higher priority
> for me to track down...
>
> Since I updated to a src from 2023-07-20, this is not the case anymore.
> The data is the same (maybe even a bit more, as I have added 2 more
> jails since then and the periodic daily runs which run more or less in
> parallel, are not taking considerably longer). The speed increase with
> the July-build are in the area of 3-4 hours for 23 parallel periodic
> daily runs. So instead of finishing the periodic runs around 5pm, they
> finish already around 1pm/2pm.
>
> So whatever was done inside ZFS or VFS or nullfs between 2023-06-19 and
> 2023-07-20 has given a huge speed improvement. From my memory I would
> say there is still room for improvement, as I think it may be the case
> that the periodic daily runs ended in the morning instead of the
> afteroon, but my memory may be flaky in this regard...
>
> Great work to whoever was involved.
>
several hours to run periodic is still unusably slow.
have you tried figuring out where is the time spent?
I don't know what caused the change here, but do know of one major
bottleneck which you are almost guaranteed to run into if you inspect
all files everywhere -- namely bumping over a vnode limit.
In vn_alloc_hard you can find:
msleep(&vnlruproc_sig, &vnode_list_mtx, PVFS, "vlruwk", hz);
if (atomic_load_long(&numvnodes) + 1 > desiredvnodes &&
vnlru_read_freevnodes() > 1)
vnlru_free_locked(1);
that is, the allocating thread will sleep up to 1 second if there are
no vnodes up for grabs and then go ahead and allocate one anyway.
Going over the numvnodes is partially rate-limited, but in a manner
which is not very usable.
The entire is mostly borked and in desperate need of a rewrite.
With this in mind can you provide: sysctl kern.maxvnodes
vfs.wantfreevnodes vfs.freevnodes vfs.vnodes_created vfs.numvnodes
vfs.recycles_free vfs.recycles
Meanwhile if there is tons of recycles, you can damage control by
bumping kern.maxvnodes.
If this is not the problem you can use dtrace to figure it out.
--
Mateusz Guzik <mjguzik gmail.com>