Ways to "pause" ZFS resilver?

Peter Eriksson pen at lysator.liu.se
Sun Mar 8 19:01:00 UTC 2020


Data drives are 12 HGST 10TB 7200rpm spinning rust… (2xRAIDZ2(4+2))

Well, except for the log (dual Intel DC S3700) and cache (Intel 750 Series PCIe) devices. But I’m not seeing any errors on those.


(The NFS-hickups seem to be happening in “nfsmsleep()” for some reason.

  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=244665 <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=244665>

Dtrace output:
 27  54842               nfsrvd_dorpc:entry Start
 27  47273                 nfsv4_lock:entry Start(lp->nfslock_lock=6, iwantlock=0)
 27  37590                  nfsmsleep:entry Start(ffffffff81e9982c, ffffffff81e998a0, 99)
 27  54396                     _sleep:entry Start(prio=99, timeo=0)
  7  54397                    _sleep:return    7171965 µs
  7  37591                 nfsmsleep:return    7171972 µs
  7  47274                nfsv4_lock:return    7171979 µs

But it would be really nice to be able to have some way to temporarily pause a running resilver while I’m investigating this issue)

- Peter


> On 8 Mar 2020, at 19:37, Warner Losh <imp at bsdimp.com> wrote:
> 
> 
> 
> On Sun, Mar 8, 2020 at 12:35 PM Peter Eriksson <pen at lysator.liu.se <mailto:pen at lysator.liu.se>> wrote:
> I’m looking for ideas on how to pause a running ZFS resilver on a FreeBSD 11.3-RELEASE-p6 system.
> 
> The reason is we have a system where a running such causes severe NFS “hiccups” for our users (like 5-20s delays more or less often) and thus I’d like to figure out some way to “pause” it during office hours until either we’ve found and fixed the problem or the resilver is done (1D15H to go)...
> 
> Since there isn’t any “zfs” command to pause a running resilver I’m pondering alternative more “creative” ways.
> 
> /usr/src/cddl/contrib/opensolaris/uts/common/fs/zfs:
> 
> >       if (zio_flags & ZIO_FLAG_RESILVER)
> >                scan_delay = zfs_resilver_delay;
> >        else {
> >                ASSERT(zio_flags & ZIO_FLAG_SCRUB);
> >                scan_delay = zfs_scrub_delay;
> >        }
> >
> >        if (scan_delay && (ddi_get_lbolt64() - spa->spa_last_io <= zfs_scan_idle))
> >                delay(MAX((int)scan_delay, 0));
> 
> Settings vfs.zfs.scan_idle to something high and then vfs.zfs.resilver_delay to 10*60*60*kern.hz (10 hours) and hoping the “if" statement will trigger? But that assumes nothing can/will interrupt delay(). Hmmm...
> 
> Any other suggestions?
> 
> (I don’t want to abort the resilver).
> 
> If you are dealing with SSDs, you might look to see if BIO_DELETE (trim) traffic is causing delays. If so, you can temporarily disable TRIM on the disk being resilvered. In the resilver case, trim doesn't help much anyway since you're rewriting the entire drive. If not, then I'm not sure what else to recommend...
> 
> Warner 



More information about the freebsd-fs mailing list