Interesting: ZFS scrub prefetch hurting sequential scrub performance?

Warner Losh imp at bsdimp.com
Fri Jan 4 16:28:44 UTC 2019


On Fri, Jan 4, 2019 at 3:53 AM Borja Marcos <borjam at sarenet.es> wrote:

>
>
> > On 3 Jan 2019, at 11:34, Borja Marcos <borjam at sarenet.es> wrote:
> >
> >
> > Hi,
> >
> > I have noticed that my scrubs have become painfully slow. I am wondering
> wether I’ve just hit some worst case or maybe
> > there is some interaction between the ZFS sequential scrub and scrub
> prefetch. I don’t recall seeing this behavior
> > before the sequential scrub code was committed.
> >
> > Did I hit some worst case or should scrub prefetch be disabled with the
> new sequential scrub code?
>
> I have done a test with the old scrub code (vfs.zfs.zfs_scan_legacy=1) and
> I see a very similar behavior, with the
> scrub stalling again.
>
> Once more, disabling prefetch for the scrub (vfs.zfs.no_scrub_prefetch=1)
> solves the issue.
>
> I suffered this problem on 11 at some point but I attributed it (wrongly!)
> to hardware problems at the time.
>
> Not I’ve just found a talk about a new prefetch mechanism for the scrub by
> Tom Caputi. Could it be the problem?
> https://www.youtube.com/watch?v=upn9tYh917s


It's always been a hard problem to schedule background activity without
affecting foreground performance. For Hard Drives this isn't so terrible to
do: keep the queue depths small so that when any new work arrives, the
latency in switching between the two workloads is small. With SSDs, it gets
harder, though in a read-only workload it degenerates to about the same.
SSDs do their own read ahead, sometimes, and they have lots of background
activity that can be triggered by reads (like if it could read the block,
but the error rate from the NAND was over some threshold, the drive might
decide to copy all the data out of that block because data with that error
rate won't be readable with the correction codes in place long enough to
meet the retention specs). And writes can also trigger this background
behavior. So switching between the foreground and background tasks becomes
even more sluggish.

But I think in ZFS' case, it may just be a bit of a bug in backing off the
scrub operation to allow better local host performance...

Warner


More information about the freebsd-fs mailing list